SEARCH

SEARCH BY CITATION

Keywords:

  • Computational statistics;
  • computer-intensive methods;
  • covariogram;
  • microscopy;
  • prediction;
  • sampling theory;
  • simulation;
  • stereology;
  • systematic sampling;
  • transitive theory;
  • variance;
  • volume

Abstract

  1. Top of page
  2. Abstract
  3. (1) Introduction
  4. (2) Methods
  5. (3) Results
  6. (4) Discussion
  7. References

The Cavalieri method is an unbiased estimator of the total volume of a body from its transectional areas on systematic sections. The coefficient of error (CE) of the Cavalieri estimator was predicted by a computer-intensive method. The method is based on polynomial regression of area values on section number and simulation of systematic sectioning. The measurement function is modelled as a quadratic polynomial, with an error term superimposed. The relative influence of the trend and the error component is estimated by techniques of analysis of variance. This predictor was compared with two established short-cut estimators of the CE based on transitive theory. First, all predictors were applied to data sets from six deterministic models with analytically known CE. For these models, the CE was best predicted by the older short-cut estimator and by the computer-intensive approach, if the measurement function had finite jumps. The best prediction was provided by the newer short-cut estimator when the measurement function was continuous. The predictors were also applied to published empirical datasets. The first data set consisted of 10 series of areas of systematically sectioned rat hearts with 10–13 items, the second data set consisted of 13 series of systematically sampled transectional areas of various biological structures with 38–90 items. On the whole, similar mean values for the predicted CE were obtained with the older short-cut estimator and the computer-intensive method. These ranged in the same order of magnitude as resampling estimates of the CE from the empirical data sets, which were used as a cross-check. The mean values according to the newer short-cut CE estimator ranged distinctly lower than the resampling estimates. However, for individual data sets, it happened that the closest prediction as compared to the cross-check value could be provided by any of the three methods. This finding is discussed in terms of the statistical variability of the resampling estimate itself.


(1) Introduction

  1. Top of page
  2. Abstract
  3. (1) Introduction
  4. (2) Methods
  5. (3) Results
  6. (4) Discussion
  7. References

Systematic sampling designs are of major importance in stereology. Equidistant windows of observation are placed into an object, whereby the location of one window determines the location of all other windows (Cochran, 1977). The classical example of one-dimensional systematic sampling is volume estimation by Cavalieri sampling, where the area contents Ai of n sectional planes through a body K with a distance d apart are estimated or measured along an arbitrarily orientated spatial axis (the x-axis) orthogonal to the planes. Let us denote the projected height of the body onto the x-axis as h and its volume as V. Provided that the location of one of the planes is selected uniformly at random in the interval [0, d), the Cavalieri estimator reads:

  • image(1)

For the sake of illustration, the origin of the x-axis may be identified, e.g. with the lower tangent point of the orthogonal planes to K. The Cavalieri estimator is unbiased for the volume estimation of bodies of arbitrary shape and anisotropy properties and is also highly efficient. Systematic sections usually lead to volume estimates with much less variance than a hypothetical volume estimate, which would result, e.g. from the product of h (supposed to be known) with the mean value of Ai taken from parallel sections through K with independent random positions along the x-axis. The squared coefficient of error (CE) in the latter case, which corresponds to a simple random sample, declines in the order of 1/n, whereas under the condition of systematic sampling, it declines usually in the order of 1/(n2) − 1/(n4), dependent on the geometry of the object. Moreover, systematic sampling is the natural approach and simpler to realize in applications, as compared to simple random sampling. In biomedical imaging techniques, for example, observations are recorded at registered equidistant planes perpendicular to a vertical axis by many devices by default (scans from computer axial tomography, nuclear magnetic resonance tomography, confocal laser scanning microscopy, etc.).

Although the Cavalieri estimator of the absolute volume of single objects is unbiased, the problem of predicting its CE from a given data set of areas alone is not solved in general. Various estimators (predictors) have been suggested for this purpose (e.g. Gundersen & Jensen, 1987; Mattfeldt, 1987, 1989; Cruz-Orive, 1997, 1999; Gundersen et al., 1999; García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana et al., 2003). In the present paper, a computer-intensive method for CE prediction of the Cavalieri estimator from a set of empirical data is presented. Polynomial regression methods are used to decompose the area series into a deterministic component and an error component. Systematic sectioning is simulated within the computer in a deterministic manner after fitting cubic splines to the data points. After taking into account the results of both steps, an estimate of the total CE of the Cavalieri estimator is obtained, that considers both the deterministic and the error component. In the present paper, the CE estimation by the computer-intensive method was compared to two short-cut estimators, which are based on a quadratic approximation to the estimated covariogram of the data (CEsh1: Gundersen & Jensen, 1987; CEsh2: Gundersen et al., 1999). Both estimators are based on transitive theory (Matheron, 1965, 1971). For the purpose of the comparisons, we use six analytical models of f(x) as well as previously published data sets of empirical areas of transected planes resulting from one-dimensional systematic sectioning designs (Mattfeldt, 1987: Table 1; Gundersen et al., 1999: data sets V1–V13).

Table 1.  Heidelberg rat heart data. Resampling: 1 sample [RIGHTWARDS ARROW] 2 overlapping subsamples.
No.CEsysR2CEranCEresCEtotCEsh1CEsh2
10.059440.984050.121100.05173  0.06086*0.04448  0.00994
20.045090.970550.120320.00663  0.04509  0.03780*0.00845
30.063910.980100.133070.06669*0.06579  0.04580  0.01024
40.064230.993850.125150.05654  0.06423*0.04909  0.01097
50.052940.993400.137560.02736  0.05294*0.04402  0.00984
60.057510.995200.122140.04890  0.05801*0.04423  0.00989
70.047300.984350.142780.03211  0.05043*0.03649  0.00816
80.072980.997300.124720.03653  0.07308*0.05498  0.01229
90.054510.978650.138570.05570*0.05756  0.04774  0.01067
100.046270.981700.121460.00756  0.04870  0.03639*0.00813
Mean0.056400.985920.128680.03897  0.05766  0.04410  0.00985

(2) Methods

  1. Top of page
  2. Abstract
  3. (1) Introduction
  4. (2) Methods
  5. (3) Results
  6. (4) Discussion
  7. References

2.1. Fundamental concepts

Essentially, one-dimensional systematic sampling reduces to the estimation of the area between the function y = f(x) and the x-axis from a set of yi-values at equidistant values xi, i.e. (x1, y1) (x2, y2), … , (xn, yn). It is surmised that f(x) is everywhere bounded and f(x) > 0. The function f(x) has been denoted as the area function or measurement function (for details see Cruz-Orive, 1989, 1999; García-Fiñana & Cruz-Orive, 2004). In general, f(x) may be deterministic, random, or a combination of both (deterministic with noise superimposed).

2.2. Deterministic models

If f(x) is deterministic, analytical solutions can be found which provide CE(m) as a function of m = h/d for various simple mathematical models, where m is the mean number of sections with positive area contents that the systematic sections generate in K. Clearly CE(m) declines generally with increasing m but not always monotonically. For various models CE(m) from systematic samples shows a strongly fluctuating behaviour called ‘Zitterbewegung∏. In general, the variance consists of three terms: the extension term, Zitterbewegung, and terms of higher order; usually only the extension term is used for CE prediction (García-Fiñana & Cruz-Orive, 2000b). Zitterbewegung implies that the CE(m) of a Cavalieri estimator may increase although the distance between sections decreases. The CE(m) for specific analytical models is shown in Fig. 2(a–f).

image

Figure 2. The plots show the mean section number on the abscissa and the CE on the y-axis for the analytical models M1M6 (see insets). The full-drawn curves indicate the true analytical values (see Mattfeldt, 1989). The crosses show the computer-intensive prediction (CEsys), the stars and the diamonds represent the old and new version of the short-cut estimators, CEsh1 and CEsh2. Note strong ‘Zitterbewegung’ of analytical functions, which is faintly reflected in the plots of CEsys. (a) Results for model M1. CEsh1 and CEsys lie near the tops of the analytical values, whereas CEsh2 runs near the valleys. (b) For the linear model M2, the CE is acceptably predicted by CEsh1 and CEsys, whereas CEsh2 is too low. (c,d) For models M3 and M4, the CE is overestimated by CEsys and CEsh1, whereas the prediction by CEsh2 lies nicely in the middle region of the analytical functions, i.e. between the hills and valleys of the ‘Zitterbewegung’. (e) For model M5, predictions by CEsh1 and CEsys are too high, whereas the CEsh2 values are too low. Compare with García-Fiñana & Cruz-Orive (2004), p. 261, Fig. 3b. (f) For M6, i.e. linear integration of a homosceles triangle, CEsh1 and CEsys generally tend to overestimate the true CE, whereas CEsh2 provides an acceptable prediction.

Download figure to PowerPoint

2.3. Data sets with deterministic and noise components

Up to now we have considered f(x) to be a deterministic function of x without error. However, it is natural to consider noise (random error) superimposed onto f(x), that may result basically from the following two sources. First, the function f(x) may itself represent a basically deterministic process superposed with random fluctuations that truly exist in the geometry of the object (irregular anatomical in- and outpouchings of a body, erratic ‘bumps and lumps’ on an otherwise smooth surface). Second, in real applications it is generally not feasible to measure the areas of the systematically transected objects entirely without error. This holds for planimetry of tomographical scans and also for stereological estimation of sectional areas, e.g. by point counting using a planar grid. In the latter approach the estimated sectional areas are a random variable depending on the positions of the point grids in the sectional planes (see Cruz-Orive, 1999). In practice, one is confronted with the series of the area values only. Usually there is no a priori knowledge on the relative strength of signal and noise. If their relation is needed for CE estimation, it must be inferred from the data themselves. Ideally, it would be desirable to estimate such a quantity from the data which should be 1 in the case of a purely deterministic signal, and 0 in the case of a purely random signal (e.g. white noise).

Methods to estimate such a quantity are well-known from time series analysis. In principle, it is possible to treat also a series of systematically sampled scalar values (e.g. areas) along a spatial axis with methods of time series analysis (Mattfeldt, 1987, 1989, 1997). In this context, we consider the positions of the planes along the axis as independent variable x, where the measurement function y = f(x) is the dependent variable and the systematically sampled areas Ai are observations of y at regular intervals. One of the most elementary approaches to such a signal in time-series analysis is regression of y on x (e.g. Shumway, 1988). In the simplest case this is a linear regression where the statistical model is:

  • y = a1x + a0 + ɛ(2)

In this case it is assumed that the true relation between y and x is linear, and that all deviations of the y-values from the line are due to the random error term ɛ, which is supposed to be uncorrelated with x and normally distributed. The parameters a0 and a1 are to be defined as real constants. Using analysis of variance techniques, the observed total variance between the y-data may be decomposed into a variance component explained by the model and a residual variance component unexplained by the model. It is common to express the relation of the explained variance to the total observed variance as R2, which denotes a measure of certainty; in the context of linear models it is also denoted as the coefficient of determination (Sachs, 2002). It is convenient to write it in the form:

  • image(3)

where TSE denotes the total sum of squares due to error, and TSS is the total sum of squares (SAS Institute, 2000; pp. 39–43). Let us use the quantity R2 to characterize the signal-to-noise-ratio of our empirical data sets. It is necessary first to decide on a class of parametric models for f(x), which should be realistic for the global type of application. The following considerations are based on experience with biological structures such as organs and cells. In many cases the cross-sectional areas begin their series with low values near zero, then reach a maximum smoothly, and then decline smoothly again. This is the case in solid organs or cells that exhibit some degree of symmetry, e.g. a kidney, a lymph node, or a synoviocyte (Gundersen & Jensen, 1987). In other specimens the area values rise smoothly from near zero, attain their maximum and there-after fall to zero rather sharply. An example for this behaviour, which one finds in organ with less or no symmetry, is the heart (Mattfeldt, 1987, 1989), for another example see Gundersen & Jensen, 1987, Fig. 2. Both of these cases may be modelled in first approximation by a polynomial of second degree with superimposed error, which appears rather flexible. Hence, for a set of empirical data it is suggested to perform a regression of the Ai on x by fitting a second-degree polynomial to the data according to the principle of least squares. For a second-degree polynomial we have:

  • y = a2x2 + a1x + a0 + ɛ(4)

In the course of this polynomial regression, we obtain an estimate of R2, which can be used in conjunction with an estimate of the deterministic component of the variance of the systematic sample (section 2.4) to estimate its overall variance.

2.4. Prediction of the variance of the Cavalieri estimator using computer-intensive methods

Computer-intensive methods (computational statistics) include data-analytical procedures that involve a huge number of often highly repetitive computations. Usually the following domains are classified as computer-intensive: simulation methods, Monte Carlo methods, resampling techniques (bootstrap, jackknife), permutation and randomization tests (Noreen, 1989; Manly, 1997; Ludbrook, 2000; Mattfeldt & Fleischer, 2005).

In previous work on computer-intensive methods for estimation of the CE in systematic sampling, Monte Carlo methods were used (Mattfeldt, 1987, 1989). Essentially, systematic sectioning was simulated many times by selecting uniform random positions of the first plane in the interval [0, 1), and the volume of the body K was estimated virtually by the Cavalieri principle from values of the estimated measurement function at equidistant positions (see also Cruz-Orive & Myking, 1981). The functional values between the data points were obtained by linear interpolation. A slightly modified and deterministic approach was used in the present study. Instead of drawing 100 random locations in [0, 1), hundred positions of the probe in this interval were simulated at a constant distance of 0.01, i.e. in the 100 simulations the first plane was positioned deterministically at the abscissas 0, 0.01, 0.02, … , 0.99. Also in contrast to the previous investigation, cubic spline interpolation between the data points was used. For each start position a virtual systematic sample of areas through the body K arises, i.e. 100 virtual systematic samples leading to 100 volume estimates were obtained. From these data, the CV (ratio of standard deviation to mean value) was computed. The latter was used as an estimator of the deterministic component of the standard error of the Cavalieri estimator, CEsys, i.e. the error component due to the systematic sectioning itself.

The aforementioned estimate CEsys is too low for empirical data with superimposed error (see also Gundersen et al., 1999). To estimate the CE of the Cavalieri estimator with contributions from systematic sectioning as well as from error, the following considerations seem appropriate. (i) If the measurement function were free from error, the variance of the Cavalieri estimator could be directly estimated from deterministic sectioning simulation, as described above (CEsys). (ii) If the measurement function were only white noise, i.e. random fluctuations around a horizontal line, the CE of the Cavalieri estimator would be approximately equal to the CE of a simple random sample, CEran. (iii) If the data gave rise to a mixed model with a deterministic and a random error component, the influence of the deterministic component on the CE estimate should be weighted in proportion to the signal-to-noise-ratio of the signal. (iv) The coefficient of determination R2 may be used as a such a signal-to-noise ratio. Taking all these considerations together, it is suggested to proceed in the following manner: determine CEsys for the systematic sampling by simulation, and separately CEran by using the standard formula for the inline image of independent observations. This yields the two points (0, inline image) (1, inline image) in the (R2, CE2)-plane for the sample. It is suggested to estimate the total CE2 = inline image by linear interpolation at the R2-value of the sample between 0 and 1, which provides a value between inline image and inline image. The estimator reads:

  • image(5)

It has the desired property of yielding CEtot = CEsys for R2 = 1 and CE = CEran for R2 = 0 and provides intermediate values for for CEtot for R2-values between 0 and 1.

2.5. Short-cut estimators of the CE of systematic samples based on transitive theory

Short-cut estimators of the CE of a systematic sample in general statistics are based on various principles such as autocorrelation, first/second differences, interpenetrating sample sums, and others (see References in Mattfeldt, 1989). In the stereological context, short-cut estimators based on transitive theory have been used most often (Gundersen & Jensen, 1987; Gundersen et al., 1999). Hence they were used in this study for the purpose of comparison with the computer-intensive estimator. The equations read:

  • image((6a))
  • image((6b))

(Gundersen & Jensen, 1987; Cruz-Orive, 1993; Gundersen et al., 1999). While the computer-intensive methods of the previous section require the use of a PC or workstation, computation of a short-cut estimator from a small to moderate data set is feasible also with a pocket calculator. The equations are identical except for a constant factor a that amounts to 1/12 or 1/240, respectively. Thus, the estimate of the variance is decreased by the factor 20 when the second formula is used. Hence, the choice makes a real difference and is not ‘l’art pour l’art’ when applied to real data.

For a proper understanding of these equations, the following brief informal review seems appropriate. The roots of transitive theory come from the work of Matheron on the theory of regionalized variables (Matheron, 1965, 1971). The term ‘transition∏ is related to the transition of the measurement function from positive values to 0 everywhere outside of its domain of definition. Typical for the transitive approach is the prediction of the CE in systematic sampling from the covariogram, rather than from the measurement function itself. The covariogram g(t) is related to the measurement function y = f(x) by the equation:

  • image(7)

For analytical functions y = f(x) the covariogram can often be found directly by integration. For real data, the covariogram must be estimated from the data (empirical covariogram). In principle, the transitive approach distinguishes between deterministic data sets and data sets with random error superimposed, which is called here ‘nugget variance∏ (Cruz-Orive, 1989, 1999). Following the classical transitive approach, objects are classified into two categories, according to their smoothness class m, which is an integer number and may assume the values 0 or 1. Thus, we have 0-objects and 1-objects on the basis of the smoothness class of the measurement function. A 0-object has one or more finite jumps (discontinuities) in its domain of definition, whereas the measurement function of an 1-object is continuous. For example, a sphere is an 1-object, whereas a transversely cut mushroom is a 0-object (García-Fiñana & Cruz-Orive, 2000a, 2000b). The m-class determines whether the estimator CEsh1 or CEsh2 has to be recommended: it is CEsh1 for 0-objects and CEsh2 for 1-objects. The analytical models M1M2 given below are examples for 0-objects because they have one or more finite jumps at the ends of support. The models M3, M4 and M6 are examples for 1-objects because they are continuous and their first derivatives have no unbounded jumps. The model M5 below cannot be classified as a 0-object or 1-object although it is continuous, because its first derivative shows unbounded jumps at the ends of its support. It is a counterexample to the current transitive theory (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004). Refined short-cut CE predictors for 0-objects and 1-objects have been suggested (Gundersen et al., 1999; Kiêu et al., 1999). Moreover, the estimators may be further improved by including terms for random error (nugget variance) (Gundersen et al., 1999). In the present paper, it is assumed that the sectioned object is primarily a ‘black box∏ which cannot be safely classified as 0- or 1-object, which is the usual situation in applications. Hence the estimators CEsh1 and CEsh2 were tested on all datasets as if they appeared in an empirical setting.

2.6. Test of CE estimators on deterministic data sets

To show how the performance of the estimators shown in sections 2.4 and 2.5 was tested by using data sets of deterministic models, the procedure is illustrated by an example (Fig. 1), where the case m = 5.4 is illustrated for model M3 with b = 1, a = 4, c = 1/2, i.e. y = 1 – 4(x−0.5)2 in the interval [0, 1]. Here the distance between successive planes is d = 1/5.4 = 0.1852, hence the first plane has to placed uniformly into the interval [0, 0.1852). If it happens to be placed into [0, 0.0741], the systematic probe generates 6 sections, if it happens to be placed into [0.0741, 0.1852], it produces only 5 sections (as 0.0741 = 0.1852 × 0.4). When estimating the CE from data, the investigator does not know m, but is only confronted with an empirical series of 5 or 6 areas. Multiplication of the sum of these areas with d = 0.1852 yields the Cavalieri estimator. The data, e.g. A1, A2, A3, A4, A5 plus the boundary values are read into computer memory as [0, 0; 0.5, A1; 1.5, A2; 2.5, A3; 3.5, A4; 4.5, A5; 5.5, 0] (Fig. 1a; see also Mattfeldt, 1989). Cubic spline interpolation is performed between these values (Fig. 1b). Then 100 values of are computed for systematic samples with startpoints 0, 0.01, 0.02, … , 0.99. From these 100 values, the mean value and the SD of , and hence the CE() are computed. Thus the estimate of the CE for one realization of the model, here for 5 particular Ai-values, is computed.

image

Figure 1. (a) The deterministic model M3 (Cavalieri sections through a triaxial ellipsoid) was simulated 100 times for each m-value, here for m= 5.4. The latter value leads to 60 data sets with 5 sections and 40 data sets with 6 sections (see also Fig. 1c). In this diagram a selected simulation with a series of 5 Ai values is plotted at xi= 0.5, 1.5, ... , 4.5. The values at the boundaries are set to (0,0) and (5,0), hence totally 7 data points are available for further evaluations. (b) The 7 points (crosses) from Fig. 1a have been linked with cubic splines. Systematic sectioning is simulated by finding the values of the spline function proceeding from 100 deterministic startpoints in the interval [0, 1) at (0, 0.01, 0.02, 0.03, ... , 0.99) and at the subsequent xi-values a distance 1 apart. The simulated systematic sample generated by one of these startpoints is indicated as ‘×’. (c) One of the 40 simulations of M3 with m= 5.4 which generated 6 systematic sections. (d) Simulation of systematic sections (‘×’) after spline interpolation of the data of Fig. 1c (+).

Download figure to PowerPoint

For the whole series of computations for a single m-value, the procedure above was carried through for 100 area series of the model, i.e. 100 data sets starting from uniform points between [0, d). These 100 data sets were then presented to the systematic sectioning simulation program. For each of the 100 data sets per m-value, 100 loops of systematic sectioning simulation followed. In the end, the mean CE for all 100 data sets per m-value was computed and compared with the analytical value. Then the CEs were computed using the two short-cut estimators CEsh1 and CEsh2. The deterministic models M1M6 have been extensively described in (Mattfeldt, 1989) and will not be reiterated here in detail. Suffice it here to say that they correspond to the following models: M1: a horizontal line, y = c; M2: a line through the origin, y = cx; M3: the polynomial y = ba(x − c)2 with a > 0, b > 0, inline image, i.e. the measurement function corresponding to Cavalieri estimation of the volume of a triaxial ellipsoid; M4: the sine function y = sin(x) in [0, π]; M5: the function inline image in [0, 2c], corresponding to the linear integration of a circle; M6: the function y = 1 − (2| x− 0.5 |) in [0, 1], corresponding to the linear integration of a homosceles triangle. The models M1M6 are illustrated in the insets of Fig. 2(a–f). The comparison of the predicted values of the mean CE(m) with the analytical value of CE(m) was performed for the m-values 1, 1.1, 1.2, … , 10.

2.7. Application of CE estimators to empirical data sets

For the studies on real structures, the following data sets were used. The first series of data sets consisted of sectional areas of rat hearts, which included the tissue compartment (heart muscle) of the left ventricle (Mattfeldt, 1987; Table 1). Ten hearts were available, in which 10–13 systematic area values per case were measured by planimetry. Furthermore, 13 data sets of various examples of Cavalieri estimates of absolute volume of biological objects in the macroscopic and microscopic domain (organs, cells), which have been made accessible under ftp://ftp.imf.au.dk/pub/dist/stoclab/systematic.dat, were used for CE prediction (Gundersen et al., 1999; data sets V1–V13). There were 38–90 area values per data set. For these data sets, the evaluation started with second-degree polynomial regression and computation of R2. Then computer-intensive simulation of systematic sectioning was performed which provided CEsys, and the final CE = CEtot was estimated by insertion of R2 (see eq. 3) into eq. (5) (See Figs 3–5).

image

Figure 3. A selected example of rat heart section areas Ai is plotted as function of section position xi. The 12 measured areas are plotted at xi= 0.5, 1.5, ... , 11.5 as crosses. A quadratic polynomial was fitted to the data points (full drawn curve) with standard software for polynomial regression of Ai on xi according to the principle of least squares. After fitting, an analysis of variance is performed, which provides the values TSE and TSS for computation of R2 in eq. (3), which is later used to determine CEtot via eq. (5) (see also Fig. 5). In general, quadratic polynomial regression of the area values to xi set yielded good fits for the rat heart data set, as indicated by the high R2 values > 0.97 in Table 1.

Download figure to PowerPoint

image

Figure 4. The same dataset as shown in Fig. 3 is used for simulation of systematic sectioning. Again the data points Ai are plotted at (0.5, 1.5, ... , 11.5). At the boundaries the values are set to (0, 0) and (12, 0). The resulting 14 values are joined with cubic splines (full drawn). Systematic sectioning is simulated by 100 series of systematic sections which arise from 100 startpoints uniformly and deterministically distributed in [0, 1). A selected example of such a virtual systematic area series is indicated as ‘×’.

Download figure to PowerPoint

image

Figure 5. Computation of the final estimate CEtot for an empirical Cavalieri sample (Aarhus biological data set, series 8, see Table 2). The diagram shows R2 on the x-axis and CE2 on the y-axis. The value inline image is determined from the unordered area sections, using the standard equation inline image. It would predict the estimator variance correctly if there were no correlation in the dataset, hence it is plotted at R2 = 0. The value of inline image would explain the entire variance in the complete absence of error in the model. It was determined by simulation (see Fig. 4). For a sample where the total error is due to systematic sampling plus error, the total CE may be estimated by linear interpolation between CEran and CEsys. In this example it is R2= 0.18, i.e. a very low value. This weighting procedure leads to more realistic CE predictions for very noisy data sets, see e.g. Table 2, data sets 6–9.

Download figure to PowerPoint

While analytical results can be used as a cross-check for the CE predictions for deterministic models, no such exact control values are available for empirical data sets. An approximative cross-check is nevertheless possible (Mattfeldt, 1989; Gundersen et al., 1999). In this case subsets of the original data sets with increased (e.g. twofold, threefold, … , n-fold) sampling distance can be drawn from the original complete sample. Then it is possible to estimate from each of the systematic subsamples and determine the SD of the -values for this increased sampling distance and thus estimate a CE(). This value can in turn be used as a cross-check, when the CE is predicted from each of the subsamples. This approach has formerly been applied to the rat heart data set by doubling the sampling distance (Mattfeldt, 1989). For the Aarhus biological data set, subsampling has been performed in such a manner that the subsamples were reduced to series of ≈ 10 sections on the average (Gundersen et al., 1999). In the present study, both data sets were subsampled in the same manner as suggested in the original papers. The CEs were predicted from the subsamples using the computer-intensive method with consideration of R2, and with the short-cut estimators CEsh1 and CEsh2. The predictions by these three methods were compared to the resampling estimate.

(3) Results

  1. Top of page
  2. Abstract
  3. (1) Introduction
  4. (2) Methods
  5. (3) Results
  6. (4) Discussion
  7. References

3.1. Deterministic models

For the data sets from the deterministic models, the CE(m) was estimated directly as CEsys from the simulation of systematic sectioning by computer. The results are plotted in Fig. 2(a–f) for CEsys, CEsh1 and CEsh2 together with the true analytical values. Zitterbewegung is evident for the plots of the exact data. For the models M1 and M2, CEsys and CEsh1 lay closer to the analytical values, whereas the values provided by CEsh2 were too small (Fig. 2a,b). On the other hand, for the models M3, M4 and M6 the estimator CEsh2 lay most closely to the analytical data, whereas CEsys and CEsh1 were too high (Fig. 2c,d,f). In the case of M5, none of both short-cut estimators was found between the hills and valleys of the analytical function; CEsh1 was too high and CEsh2 was too low (Fig. 2e). In the plots of CEsys a faint echo of the Zitterbewegung of the true CE is preserved, whereas the latter is not evident in the plots of CEsh1 and CEsh2.

3.2. Heidelberg rat heart data set

The results for this data set are reported in Table 1. Polynomial regression provided estimates of R2 between 0.970 and 0.997, that is, the largely predominant fraction of the variance between the area values within the cases was explained by the polynomial model. Consequently, the primary estimates of the CE due to systematic sectioning were only slightly inflated after consideration of R2. The quantities CEtot, CEsh1 and CEsh2 were estimated separately from both data sets with doubled sampling distance per sample and averaged, and as a cross-check, the CE was directly computed as resampling estimate from the mean and SD of the -values from the two subsamples. CEsh1 and CEtot lay in the same order of magnitude. For two individual hearts, CEtot was nearest to the resampling estimate, for six hearts CEsh1 was nearest, and in two cases CEsh2 was the nearest. If only the average values were considered, both CEtot and CEsh1 lay in the same order of magnitude as CEres, whereas CEsh2 ranged distinctly too low. Note that CEsh2 was the best predictor in those individual cases where the resampling estimate happened to be extraordinarily low (cases 2, 10).

3.3. Aarhus biological data set

When the polynomial regression studies were applied to the Aarhus data, the values for R2 ranged between 0.182 and 0.978, i.e. the residual variance was sometimes more pronounced in relation to the variance explained by the polynomial model (see Table 2). Consequently, the inflation of the CE estimate because of small values of R2 was very pronounced in some cases. Similarly as shown for the heart data sets, mean CEsh1 and CEtot lay globally in the same order of magnitude as the resampling estimates, whereas the mean CEsh2 ranged lower. In six cases CEtot lay nearest to the target, in 6 cases the best predictor was CEsh1, and in one case it was CEsh2.

Table 2.  Aarhus biological data. Resampling [RIGHTWARDS ARROW] 10 sections.
No.CEsysR2CEranCEresCEtotCEsh1CEsh2
10.043900.978650.120090.03848  0.04679*0.03134  0.00700
20.010750.835950.160630.03318  0.06344*0.03551  0.00793
30.016290.840250.163750.00939  0.06583  0.03161*0.00706
40.017230.779100.180420.07118*0.08378  0.04108  0.00918
50.027810.784790.136810.02042  0.06865*0.02689  0.00601
60.071320.405620.165020.20405*0.13844  0.08581  0.01918
70.081470.367100.472250.22943  0.37127*0.19075  0.04265
80.028810.182180.612190.48476*0.55335  0.25111  0.05615
90.043190.216960.156170.16479*0.13928  0.07915  0.01769
100.018960.765860.157500.03124  0.07745*0.04113  0.00919
110.017620.604570.186880.02960  0.11858*0.04802  0.01073
120.006950.852500.192400.05574*0.06988  0.03106  0.00694
130.010140.913720.184690.06155*0.05490  0.02550  0.00570
Mean0.030340.655940.222210.11029  0.14243  0.07069  0.01580

(4) Discussion

  1. Top of page
  2. Abstract
  3. (1) Introduction
  4. (2) Methods
  5. (3) Results
  6. (4) Discussion
  7. References

The results may be summarized as follows. The CE of the Cavalieri estimator was predicted by a computer-intensive method based on a polynomial regression procedure and simulation of systematic sectioning, and by two established short-cut estimators based on transitive theory (CEsh1, CEsh2). When applied to synthetic data sets from deterministic models with analytically known CEs, very similar predictions were obtained for all models by CEsh1 and CEtot, whereas CEsh2 ranged much lower. For models M1 and M2, the estimators CEsys and CEsh1 gave better predictions than CEsh2 (Fig. 2a,b). For the models M3, M4 and M6 the estimator CEsh2 provided the best prediction, whereas CEsys and CEsh1 were too high (Fig. 2c,d,f). The better prediction of CE(m) by CEsh1 for models M1 and M2 is fully concordant with theoretical expectation, as these models are examples of 0-objects for which CEsh1 is most suitable. The better prediction by CEsh2 in case of models M3, M4 and M6 is plausible, because these models are 1-objects for which the predictor CEsh2 is superior. In the case of M5, CEsh1 was too high and CEsh2 was too low (Fig. 2e). The latter result is due to the fact that M5 is a counterexample to classical transitive theory. This model is only considered by the new fractional approach to variance estimation, as m is not an integer in this case (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana et al., 2003; Cruz-Orive & Geiser, 2004). On the whole, it was not possible to enhance the predictive accuracy of the short-cut estimators definitely by introducing the computer-intensive method. One might expect that this estimator would lie, due to its assumption-free nature, always closest to the analytical CE(m), thus more resemble CEsh1 for M1M2, more resemble CEsh2 for M3, M4 and M6, and excel both in case of M5. But this was evidently not the case: it behaved basically always very similarly as CEsh1.

The same estimators were then applied to empirical datasets. Here reduced data sets were generated from the primary samples with increased sampling distance. On the whole, similar estimates were obtained for CEsh1 and CEtot in both data sets, whereas the results by CEsh2 were distinctly lower. For both data sets, mean CEsh1 or CEtot lay nearest to the resampling estimates, whereas mean CEsh2 appeared too low. Nevertheless in some individual area series of both data sets, CEsh2 happened to provide the best prediction. This was true for series with the lowest values of CEres of both data sets. In this context one should note that the resampling estimate is, albeit qualified as a reference value, also an estimator of the CE derived from empirical data. The estimates of CEres may vary strongly, because they may fall into the hills and valleys of the Zitterbewegung, too. Hence it is very difficult to obtain a robust prediction of the CE for an individual real case. As a recommendation for a short-cut estimator for our empirical data sets, one would prefer CEsh1 over CEsh2. The latter produced the best predictions for some of the deterministic models and also for unique cases of the empirical data sets, but its mean value was too low. The computer-intensive estimator CEtot also led to an augmented estimate of the CE as compared to CEsh2 for the empirical data sets. On the other hand, it was not possible to increase the accuracy of prediction further by using CEtot instead of CEsh1 also in the case of empirical data. The estimator CEtot seems rather robust to noisy data as its computation takes into account the noise component by definition. In comparison of CEsh1 and CEtot, one must admit that the latter involves more computational workload. However, once the data set is entered in electronic form, the procedure is fully automatic, thus the human workload is not different. Also, computer-intensive procedures do not necessarily imply long execution times: the estimation of a Cavalieri CE by PC is performed within a few seconds for a single data set consisting of 10–20 area values. On the other hand, by simply simulating the systematic sectioning process, the computer-intensive approach does not involve advanced mathematics, such as transitive theory. All steps of its computation are easy to grasp. In the end, it is planned to make the programs described in this paper easily accessible for the PC user, who has only to prepare an ordered list of the cross-sectional areas as an ASCII file, and obtains R2 and a prediction of the CE of the volume estimate as output.

In the present study, Cavalieri sampling was simulated in a deterministic manner, whereas in the previous studies, it was simulated by Monte Carlo methods, where the startpoint of the systematic sample was placed uniformly at random into [0, 1), In the procedure here, the sectioning was simulated by distributing the startpoints in [0, 1), equidistantly, i.e. at (0, 0.01, 0.02, … , 0.99). The uniform distribution, albeit deterministic or random, is required anyway to guarantee unbiasedness when applying the Cavalieri estimator. According to the deterministic approach, the points are most homogeneously distributed. When 100 points are selected by Monte Carlo methods at random, it may happen that clusters of points concentrate randomly in some regions in [0, 1), and other regions are undersampled. The deterministic homogeneous distribution of points avoids this. Hence in repeated estimations of the CE on the basis of a given data set, the deterministic approach provides reproducibly the same value, which is not the case when Monte Carlo methods are used.

Selection of an explicite model for regression of the Ai on xi remains essentially an arbitrary decision. In the present paper quadratic polynomial regression was applied. This model is guided by experience with the structure of human organs and cells; it should be realistic for a broad class of biological structures. The present approach is conceptually different from estimators of the CE in Cavalieri sampling based on transitive theory; nevertheless, there is a fundamental parallel. In both approaches a quadratic approximation is chosen, albeit at a different level. Using transitive theory, the approximation is performed to the covariogram of the data, whereas in this paper the approximation is performed to the data series itself. Recently the application of these two basic approaches – covariogram vs. measurement function – has been lively discussed (see Glaser, 2005; and reply: Cruz-Orive & García-Fiñana, 2005). It follows from eq. (7) that there is a close correspondence between the measurement function and the covariogram. According to common sense, both approaches appear plausible. Presently, the data are not sufficient to decide whether one of the two alternatives is superior to the other. The approaches have seldomly been compared. In general, the short-cut estimators based on transitive theory have been applied more often to empirical data, whereas prediction of the CE directly on the basis of the measurement function was only rarely used. The computational workload of the latter approach may play a role. In the present study, the older estimator based on transitive methods for 0-objects, unmodified for error/nugget variance, and computer-intensive prediction of the CE on the basis of data of the measurement function, yielded generally consonant results. A similar finding was reported earlier using Monte Carlo simulation of systematic sampling as compared to CEsh1 (Mattfeldt, 1989).

In the present study, systematic sampling was studied in synthetic deterministic data sets without error, and in empirical data sets from the real world with unknown error. The missing link between these approaches would be synthetic data with known error. Hence, as an improved model for real data, simulation studies of models with a deterministic component and a random error term superimposed could be helpful. The model equations as well as the relative strength of the error component would be known for such models, whereas they are never exactly known for empirical data. For the synthetic models with error components it is difficult or impossible to obtain analytical results for the CE(m), as in the models M1M6. However, it is feasible to produce very long series of simulated Ai values (e.g. some hundreds or thousands), and to produce shorter subsamples leading to resampling estimates, just as it was performed with the empirical data. This would be a further challenge for the CE estimators, which could be studied in more depth in this setting.

The present study was restricted to a comparison of a computer-intensive method with the two most popular estimators derived from classical transitive theory. As mentioned above, further estimators of the CE for 0-objects and 1-objects are available (Gundersen et al., 1999; Kiêu et al., 1999). Further improved estimators based on transitive theory have been presented, where it was attempted to overcome the dichotomy between 0-objects and 1-objects (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana et al., 2003; Cruz-Orive & Geiser, 2004). Essentially, the concept of a fractional trend of the variance in Cavalieri sampling has been considered, which allows a more flexible characterization of real objects. Instead of a binary classification into the classes m = 0 or m = 1, the new approach leads to a characterization of the object in terms of a continuous smoothness constant q∈[0, 1]. For q = m = 0 and q = m = 1, the well-known estimators (eq. 6a, 6b) result with the constant factors α = 1/12 and α = 1/240, respectively. For continuous q-values between 0 and 1, the factor α(q) is a nonlinear function of q, which can be computed directly, or read off the graphical plot of α(q) in (García-Fiñana et al., 2003; García-Fiñana & Cruz-Orive, 2004). The factor α(q) lies between 1/12 and 1/240; for example, one finds α(q) ≈ 1/45 for q = 1/2 (García-Fiñana et al., 2003, 2004). Provided that data from at least 5 sections are available, q and hence α(q) can be estimated from real data sets (García-Fiñana et al., 2003; García-Fiñana & Cruz-Orive, 2004). Thus, fine-tuning of the CE predictor with adaptation to individual data sets becomes possible. These methods will hopefully further increase the accuracy of variance prediction in Cavalieri sampling. For example, Tables 1 and 2 show several cases where the resampling error lies between CEsh1 and CEsh2. For these cases, it would be worth while to explore whether application of the fractional theory could provide a better prediction. This analysis, however, was beyond the scope of the present study, which was focused mainly on a comparison of a computer-intensive approach with the established short-cut predictors. In future simulation studies of deterministic models with superimposed error, it is planned to include all transitive short-cut estimators based on m-classification as well as the new method based on q-estimation.

References

  1. Top of page
  2. Abstract
  3. (1) Introduction
  4. (2) Methods
  5. (3) Results
  6. (4) Discussion
  7. References
  • Cochran, W.G. (1977) Sampling Techniques, 3rd edn. Wiley, New York.
  • Cruz-Orive, L.-M. (1989) On the precision of systematic sampling: a review of Matheron's transitive methods. J. Microsc. 153, 315333.
  • Cruz-Orive, L.M. (1993) Systematic sampling in stereology. Bull. Int. Stat. Inst. 55, 451468.
  • Cruz-Orive, L.M. (1997) Stereology of single objects. J. Microsc. 186, 93107.
  • Cruz-Orive, L.M. (1999) Precision of Cavalieri sections and slices with local errors. J. Microsc. 193, 182198.
  • Cruz-Orive, L.M. & García-Fiñana, M. (2005) A review of the article: Comments on the shortcomings of predicting the precision of Cavalieri volume estimates based upon assumed measurement functions, by Edmund Glaser. J. Microsc. 218, 68.
  • Cruz-Orive, L.M. & Geiser, M. (2004) Estimation of particle number by stereology: an update. J. Aerosol Med. 17, 197212.
  • Cruz-Orive, L.-M. & Myking, A.O. (1981) Stereological estimation of volume ratios by systematic sections. J. Microsc. 122, 143157.
  • García-Fiñana, M. & Cruz-Orive, L.M. (2000a) Fractional trend of the variance in Cavalieri sampling. Image Anal. Stereol. 19, 7179.
  • García-Fiñana, M. & Cruz-Orive, L.M. (2000b) New approximations for the variance in Cavalieri sampling. J. Microsc. 199, 224238.
  • García-Fiñana, M. & Cruz-Orive, L.M. (2004) Improved variance prediction for systematic sampling on R. Statistics, 38, 243272.
  • García-Fiñana, M., Cruz-Orive, L.M., Mackay, C.E., Pakkenberg, B. & Roberts, N. (2003) Comparison of MR imaging against physical sectioning to estimate the volume of human cerebral compartments. Neuroimage, 18, 505516.
  • Glaser, E. (2005) Comments on the shortcomings of predicting the precision of Cavalieri volume estimates based upon assumed measurement functions. J. Microsc. 218, 15.
  • Gundersen, H.J.G. & Jensen, E.B. (1987) The efficiency of systematic sampling in stereology and its prediction. J. Microsc. 147, 229263.
  • Gundersen, H.J.G., Jensen, E.B.V., Kiêu, K. & Nielsen, J. (1999) The efficiency of systematic sampling in stereology – reconsidered. J. Microsc. 193, 199211.
  • Kiêu, K., Souchet, S. & Istas, J. (1999) Precision of systematic sampling and transitive methods. J. Statist. Plan. Inf. 77, 263279.
  • Ludbrook, J. (2000) Computer-intensive statistical procedures. Crit. Rev. Biochem. Mol. Biol. 35, 339358.
  • Manly, B.F.J. (1997) Randomization, Bootstrap and Monte Carlo Methods in Biology, 2nd edn. Chapman & Hall, London.
  • Matheron, G. (1965) Les Variables Régionalisées et leur Estimation. Masson, Paris.
  • Matheron, G. (1971) The Theory of Regionalized Variables and its Applications. Les Cahiers du Centre de Morphologie Mathématique. Fontainebleau no. 5.
  • Mattfeldt, T. (1987) Volume estimation of biological objects by systematic sections. J. Math. Biol. 25, 685695.
  • Mattfeldt, T. (1989) The accuracy of one-dimensional systematic sampling. J. Microsc. 153, 301313.
  • Mattfeldt, T. (1997) Nonlinear deterministic analysis of tissue texture. A stereological study on mastopathic and mammary cancer tissue using chaos theory. J. Microsc. 185, 4766.
  • Mattfeldt, T. & Fleischer, F. (2005) Bootstrap methods for statistical inference from stereological estimates of Volume fraction. J. Microsc. 218, 160170.
  • Noreen, E.W. (1989) Computer-intensive Methods for Testing Hypotheses. An Introduction. Wiley; New York.
  • Sachs, L. (2002) Angewandte Statistik. Anwendung Statistischer Methoden, 11th edn. Springer, Berlin.
  • SAS Institute (2000) SAS/STAT Users's Guide, Version 8. SAS Institute Inc, Cary.
  • Shumway, R.H. (1988) Applied Statistical Time Series Analysis. Prentice Hall, Engle-Wood Cliffs.