By continuing to browse this site you agree to us using cookies as described in About Cookies
Notice: Wiley Online Library will be unavailable on Saturday 7th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 08.00 EDT / 13.00 BST / 17:30 IST / 20.00 SGT and Sunday 8th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 06.00 EDT / 11.00 BST / 15:30 IST / 18.00 SGT for essential maintenance. Apologies for the inconvenience.
Unité Mathématiques et Informatique Appliquées, Institut National de la Recherche Agronomique, Domaine de Vilvert, 78352 Jouy-en-Josas Cedex, France *Modal’X, bâtiment G, Université Paris-X, 200 avenue de la République, 92001 Nanterre Cedex, France
Unité Mathématiques et Informatique Appliquées, Institut National de la Recherche Agronomique, Domaine de Vilvert, 78352 Jouy-en-Josas Cedex, France *Modal’X, bâtiment G, Université Paris-X, 200 avenue de la République, 92001 Nanterre Cedex, France
Total planar area can be estimated based on sampling by a lattice of figures (e.g. point patterns, line segments, quadrats). General formulae are provided for the approximation of mean squared errors. The approximation formulae are products of the boundary length and of a parameter that depends only on the sampling scheme. An R package is provided by the authors for the numerical computation of the mean squared error formulae. The speed of convergence of the mean squared error approximation is assessed on the basis of several simulations. Several sampling schemes are compared in view of the approximated mean squared errors.
The total area of a planar structure can be estimated from partial observations using standard tools of sampling theory. Common sampling probes are finite sets of points, lines, quadrats, etc. Measurements to be performed are counts, length or area measurements. In most cases, the sampling probes or units are systematically distributed on the plane. The whole sampling device is a lattice of figures. In microscopy, a figure corresponds to a sampling probe as seen in a single field of vision. The whole lattice of figures is obtained by systematic displacements of the field of vision.
Under the standard assumption that the lattice of figures is uniformly randomly translated, the total area estimator is unbiased. However, the precision of the area estimator depends both on the sampling scheme and on the spatial distribution of the structure of interest.
An approximation formula has been proposed by Kendall (1948) for the mean squared error (MSE) of the area estimator based on sampling by a lattice of points. Kendall's formula converges when the lattice density tends to infinity and it depends only on the curvature along the structure boundary. Kendall's formula has been refined by Matheron (1971): the MSE approximation can be decomposed into the so-called extension term and the oscillating term. Furthermore, under an isotropy assumption, the extension term depends only on the boundary length. For further discussion of the Kendall–Matheron formula, see, for example, Gundersen & Jensen (1987) and Matérn (1989). Recently, it has been proved that under some regularity conditions, the oscillating term of the MSE is of higher order than the extension term if the structure is random (Kiêu & Mora, 2004). Note that the randomness condition makes sense in most biological applications.
In this article, the Kendall–Matheron formula for sampling based on lattices of points is extended to other lattices of figures. This extension is obtained by using grading and regularization as defined by Matheron (1971). Unbiased area estimation based on sampling by lattices of figures is introduced in section 2. The Kendall–Matheron formula for point lattice sampling is given in section 3. The new formulae for general lattices of figures are provided in section 4. The performance of the MSE approximations is discussed in view of some simulations in section 5. Section 6 is devoted to the comparison of various sampling schemes.
2. Lattice of figures and area prediction
The structure of interest is a random compact set X in the two-dimensional plane ℝ2. The parameter to be approximated is the area of X. Since here the area of X is supposed to be random, below we refer to area prediction instead of area estimation.
In order to predict the area A of X, the random compact set X is sampled by means of a lattice of figures (planar subsets). Examples of lattices of figures are provided in Fig. 1. In a lattice of figures, the figures differ only by translations, and the set of such translations is a vector lattice. Any lattice of figures can be represented as
Λ + F2,
where Λ is a vector lattice and F2 is a planar subset. In the examples of Fig. 1(a–d), the vector lattice is two-dimensional and the figure F2 is compact. For the lattice of lines and the lattice of strips shown in Fig. 1(e–f), the vector lattice is one-dimensional and the figure F2 may be decomposed as
F2 = L + F1,
where L is the line orthogonal to Λ, and F1 is a compact subset of the line supporting Λ. When F1 is a single point, F2= L + F1 is the line parallel to L through F1. When F1 is a segment, F2= L + F1 is a strip. The density of the lattice Λ is defined as the inverse of the area (or length) |Λ| of a fundamental tile of Λ. A unit lattice is a lattice with density equal to 1. The approximation formulae provided below involve dual vector lattices. Two vector lattices are dual if the scalar products of their vectors are integers. For more details about lattice theory, see Conway & Sloane (1999). The unit square lattice is self-dual. The dual of a hexagonal lattice is also hexagonal. The dual of a lattice with a rectangular tile of side lengths l1 and l2 is the lattice with a rectangular tile of side lengths 1/l1 and 1/l2. More generally, dual lattices have inverse densities. The dual of a lattice Λ is denoted Λ*.
Consider a randomly translated lattice of figures
Λ + F2 = Λ + U + F2
where U is a random translation vector uniformly distributed in a fundamental tile of Λ. The random lattice of figures samples the plane uniformly, and the sampling density of Λ+ F2 is given by
where |Fi| denotes the ‘content’ of Fi (number of points, length or area, depending on the dimension of Fi).
The area predictor
is unbiased (conditionally on X). The precision of Â can be quantified through the MSE defined as
3. Lattice of points
Consider the special case where F2 consists of a single point, i.e. Λ + F2 is a simple lattice of points. Following Kendall (1948), the MSE can be expressed as
where the prime symbol to the right of the summation denotes that the origin is excluded from the summation, and the power spectral density PSDX of X is defined as the Fourier transform of the geometric covariogram of X. Following Matheron (1971), the geometric covariogram is the function
h → E[|X ∩ X + h|], h ∈ ℝ2.(4)
It can also be expressed as the mean convolution product of the indicator function of X and its reflexion with respect to the origin.
Since the densities of Λ and its dual Λ* are inverse, the asymptotic behaviour of the MSE when the sampling spacing tends to 0 depends only on the behaviour of the spectral density far from the origin. Using an asymptotic approximation of the spectral density and assuming that the boundary of X is isotropically distributed, the Kendall–Matheron formula states that
where B is the mean boundary length of X, is a unit version of Λ*, and Z denotes the Epstein zeta function.
The Epstein zeta function is a multidimensional extension of the Riemann zeta function defined by
When the phase h = 0, the notation ZΛ(s) = ZΛ(s, 0) is used. Note that in Eq. (5), (3) is scale-invariant: it depends only on the lattice shape.
The isotropy assumption is quite restrictive. However, Eq. (5) holds even in the anisotropic case, provided that the sampling grid is isotropically randomly rotated. For a square lattice, Eq. (5) yields
Note that in Matheron (1971) and Gundersen & Jensen (1987), the multiplicative constant is given as 0.0724. This is because Matheron used a numerical approximation based on the Chowla–Selberg expansion of the Epstein zeta function (Chowla & Selberg, 1949). The multiplicative constant provided in Matérn (1985) is the same as in Eq. (5). Values of the Epstein zeta function must be computed numerically. Direct computations based on Eq. (6) are not efficient, because the summand converges very slowly towards 0. Alternatives are algorithms based on the Chowla–Selberg expansion or on the incomplete gamma function expansion; see Crandall (1998) for the latter approach.
The convergence rate of the spectral density of X is related to the smoothness of its geometric covariogram near the origin. Derivations of MSE formulae as given in Gundersen & Jensen (1987) and in Cruz-Orive (1989) are based on local models for the covariogram near the origin. For the general case where F2 does not reduce to a single point, the measurements involved in the area predictor (1) can be considered as convolution products, which are simpler to handle in the Fourier space. This simplification is used in the next section, where MSE formulae are extended to general lattices of figures.
4. Lattice of figures
4.1. General MSE formulae
As noted in Matheron (1971), when F2 is compact, the area predictor based on a lattice of figures can be considered as a regularization of the area predictor based on a lattice of points. Point-wise measurements at the lattice points are replaced by local spatial averages on the figures. The computation of the spectral density of the regularized measurements is straightforward. Equation (5) extends to
where M(Λ, F2) is the scale-invariant function defined by
In the latter formula, is a rescaled version of F2, and is its geometric covariogram. The geometric covariogram is a measure whose dimension may be 0, 1 or 2 (depending on the dimension and on the shape of F2,0). The total mass of is equal to |F2,0|2. The ratio
can be interpreted as the distribution of the difference between two points picked up at random in F2,0. Note that Eq. (7) is very similar to Eq. (5): the Epstein zeta function from Eq. (5) is replaced by a weighted average in Eq. (7), where the weights are given by the geometric covariogram of the figure.
When F2 is not compact and F2 can be decomposed as L + F1 (e.g. F2 is a line or a strip), Eq. (7) does not hold any more. Point-wise measurements are replaced by integration along infinite lines. This operation is called grading by Matheron (1971), where a general formula for the spectral density of graded measurement is provided. Using both grading and regularization, the following extension holds for lattices of lines or strips:
where M(Λ, F1) is defined as in Eq. (8) with F1,0 = |Λ|−1F1. Details on the derivation of Eqs (7) and (9) are given in the Appendix.
4.2. Examples of M functions
In this section, the lattices of figures shown in Fig. 1 are considered and the forms of M functions are given. For the lattices of point patterns, segments and quadrats, the MSE is given by Eq. (7).
For a lattice of point patterns (Fig. 1a–b), the geometric covariogram is a discrete measure and
where f(h) is the proportion of pair differences in F2,0 equal to the vector h.
For a lattice of segments (Fig. 1d), the geometric covariogram of the rescaled segment F2,0 is a one-dimensional measure. Let l0 be the length of F2,0 and ω a unit vector parallel to F2,0. Then
For a lattice of quadrats, the geometric covariogram of the rescaled quadrat F2,0 is a two-dimensional measure. Let l1,0 and l2,0 be the side lengths of the rescaled quadrat F2,0. The integral involved in Eq. (8) can be written as
For lattices of lines or strips, the MSE is given by Eq. (9).
For a lattice of lines, F1 reduces to a single point, and the geometric covariogram is just the Dirac measure at the origin:
where ζ is the Riemann zeta function.
For a lattice of strips, F1 is a segment. Let l0 be the length of the rescaled segment F1,0 and let ω be a unit vector perpendicular to the strips. The geometric covariogram is a one-dimensional measure and
Using integration by parts, the integral can be simplified to
The one-dimensional Epstein zeta function values involved in the above formula can also be expressed as values of the Riemann zeta function and of a polylogarithm function.
Equations (7) and (9) are asymptotic: the approximations tend to the true MSEs as the sampling spacing tends to 0; see Kiêu & Mora (2004) for details about regularity conditions. Hence the approximation formulae may perform poorly for too sparse sampling. Section 5 provides some hints on sampling densities for which the approximations hold.
Like the Kendall–Matheron formula for point lattices, the formulae hold under the assumption that either the boundary or the lattice of figures is isotropically orientated.
The formulae can be used in order to assess the precision of area predictions, provided that information about the mean boundary length is available. If area is predicted from a lattice of points or a lattice of point patterns, an additional lattice of lines or line segments can be used in order to estimate the mean boundary length. If area is predicted from lattices of figures such as those in Fig. 1(c–f), the mean boundary length can be estimated from measurements collected for area prediction.
A practical problem is the numerical computation of the Epstein zeta function involved in Eqs (7) and (9). The computation of the Epstein zeta function is not a feature of common scientific software. MSE computations based on Eqs (7) and (9) can been performed using the R package pgs (precision of geometric sampling) available at http://www.inra.fr/miaj/article.php3?id article=439.
The approximations have a very simple structure: they are products of the mean boundary length and a term depending only on the sampling scheme. Hence ratios of MSE approximations do not depend any more on mean boundary length. Sampling schemes can be compared independently of the structure under investigation; see section 6.
The formulae can be used for the practical design of a sampling scheme. Without loss of generality, let us focus on the case where the figure F2 is compact: the area predictor MSE is given by Eq. (7). The lattice Λ can be written as
Λ = uΛ0,
where Λ0 is a unit lattice and u is a scaling parameter to be determined. The sampling scheme (i.e. u) must be designed such that
MSE[Â] = γ2A2,
where A is the mean area of X, and γ is the aimed coefficient of error (e.g. γ = 5%). In view of Eq. (7), the scaling parameter must be the solution of the following equation:
The ratio is a standard shape parameter and can be assessed visually (at low magnification) from nomograms provided, for example, in Gundersen & Jensen (1987). The mean area A is likely to be unknown, but it can be replaced by a rough estimation. Hence Eq. (16) reduces to an equation with the only unknown parameter u and can be solved numerically with standard algorithms. The numerical computation of the scaling parameter u can be performed using the pgs package.
In many cases, the parameter to be estimated is the mean area A. An unbiased estimation is provided by the average of independent realizations of the area predictor. The global MSE admits the standard decomposition
Note that the left-hand side can be estimated by the empirical variance of the Âi's.
Hence, assessing the MSE of Â using Eqs (7) or (9) enables the evaluation of the variance of A.
The MSE approximations provided in section 4.1 converge when the fundamental tile of the sampling lattice tends to a single point. At the moment, there is no theoretical result concerning the speed of convergence of the approximations. In this section, this problem is investigated using simulations. We consider three different random sets. The first random set X1 has a simple shape. The two others, X2 and X3, have more complex shapes, due to small-scale spatial variation. The latter is more regular than X2 because of stronger spatial autocorrelations. Two sampling schemes are considered: sampling by hexagonal lattices of points or point patterns.
The random sets are obtained by thresholding nonstationary random fields. The geometry of a random set is controlled both by the mean and by the covariance function of the random field. The mean µ is taken as a sum of three Gaussian densities (Fig. 2a). The random set X1 is obtained by thresholding at a level t > 0 the nonstationary random field
µ + Z1,(18)
where Z1 is a centred Gaussian random field with a Gneiting covariance function (Fig. 2b,d).
The random set X2 is obtained by thresholding at level t a random field of the following type:
µ + min(Z1, t) + Z2,(19)
where Z2 is a centred Gaussian random field with a Gneiting covariance function. The covariance parameters are chosen such that the spatial variation of Z2 is small-scale compared to Z1 (Fig. 2c,e). A similar construction is used for the random set X3, except that the covariance of the random field with small-scale variation is a Bessel function. The parameters of the Bessel function are chosen such that the mean areas and the mean perimeters of X2 and X3 are close. However, X3 tends to be much more regular than X2, due to stronger short-range autocorrelations (Fig. 2e,f).
Note that the simulated random sets are not isotropic, since the deterministic function µ is anisotropic. Statistics on the area and the perimeter of X1, X2 and X3 computed from 3000 replications are provided in Table 1.
Table 1. Statistics for the area and the perimeter of X1, X2 and X3.
The simulations of the random fields have been carried out using the R package randomfields (Schlather, 2001).
Two types of sampling scheme are applied to the random sets: sampling by lattices of points and sampling by lattices of point patterns. The lattice Λ is a hexagonal lattice. The pattern contains five points (see Fig. 4a), and it spans a square of side length 0.2. The spacing between the figures (point or pattern of five points) varies from 0.90 to 0.23. For X1, when sampled by a lattice of points, the mean total point count increases from 20 to 300.
For each random set Xj and each sampling scheme, the true MSE has been estimated from 3000 independent realizations of (Â, A). The obtained empirical MSEs are compared to the asymptotic approximations in Fig. 3. Note that the estimated MSEs for a given Xj are not independent, since they are based on the same set of 3000 realizations of Xj.
In view of the results shown in Fig. 3, the MSE approximation performs similarly for the two types of lattices of figures.
The MSE approximation yields fairly good results for the simple shape random set X1, even for rather small sample sizes. The worst relative error is about 7%. For X2, the MSE approximation does not perform as well as for X1, but the worst relative error is about 25%. When the sampling spacing is less than 0.45, the relative error is less than 6%. The true MSE curve for the random set X3 is not monotone: there is a peak for small sampling densities. This peak is not captured by the asymptotic approximation. However, for larger sampling densities, the MSE approximation turns out to be rather close to the empirically estimated MSE. As for X2, when the sampling spacing is less than 0.45, the relative error is less than 6%.
6. Comparisons of sampling schemes
As noted in section 4.3, sampling scheme performances can be compared independently of the random set X. Numerical comparisons are obtained from the MSE approximation formulae. Without loss of generality, the mean boundary length B is set to 1 and we focus only on unit lattices Λ.
Below, we compare different types of lattices of point patterns. We first consider three particular point patterns, as illustrated in Fig. 4. The point patterns are rescaled in order to fill square windows, respectively, of side length 0.1, 0.3, and 0.5.
The comparison is made for four types of two-dimensional unit lattices: hexagonal, square, rectangular 1 (2 × 0.5) and rectangular 2 (4 × 0.25). All values of the approximate MSE are given in Table 2.
Table 2. Approximate values of MSE for four unit point lattices, three point patterns and three window side lengths.
Window side length
Number of points
Approximate values of MSE for four lattices
The first line of Table 2 shows the approximate MSE values for the four simple point unit lattices. These values are given in Matérn (1989), where these four systematic sampling schemes are compared. The minimum value is obtained for the hexagonal lattice. This illustrates the general optimality result given in Rankin (1953), in which the author demonstrates that for all s ≥ 1.035, the hexagonal lattice minimizes the Epstein zeta function ZΛ(s). Hence, the hexagonal lattice is optimal among all point lattices with a given density.
Also note that for window sizes 0.1 and 0.3, the MSEs are quite close for patterns of 5, 8 and 9 points. For such small figure sizes, increasing the number of points per pattern is not an efficient way to improve the precision of the area predictor.
For the different point patterns considered, the approximate MSE values obtained for the hexagonal and square lattice are quite similar. Compared to these two lattices, the performance of the rectangular lattices is quite poor.
Next, for a given lattice of points Λ, a square window W of given side length l and a given number of points n, an optimal pattern of n points included in W can be computed by optimizing the MSE approximation. The results shown in Fig. 5 are obtained using the L-BFGS-B optimization method, which allows box constraints (Byrd et al., 1995). Since this method uses a quasi-Newton algorithm, it yields only a local minimum that depends on the provided initial pattern. For each optimal pattern search, 20 L-BFGS-B optimizations were carried out, starting from randomly chosen initial patterns.
When the side length of the window is small compared to the tile of the lattice, the optimization procedure gives optimal patterns with points located at the corners of the window W. For n > 4, this yields several points at the same corner. This is the case for l = 0.1, as seen in Fig. 5. To give a more complete picture of the case l = 0.1, the optimal patterns and associated approximate MSE values obtained for n = 2, ... ,5 are shown in Fig. 6.
From these results, one can see that for n varying from 1 to 7, the optimal pattern is obtained for n = 4. Adding a fifth point increases the MSE value.
Sometimes, lower-dimensional probes are more efficient than higher-dimensional probes. Jensen & Gundersen (1981) considered the example of a disc X sampled by an isotropic uniform random (IUR) unit square and by the pattern consisting of the four corners of the IUR square. It turns out that for sufficiently large values of the disc radius, the point pattern probe performs better than the square probe. This situation, where point counting is more efficient than area measurement, has been investigated and exemplified in Baddeley & Cruz-Orive (1995) for stationary random sets. The authors related this paradox to Smit's paradox appearing in the theory of random fields.
Below, we compare sampling by lattices of quadrats and sampling by lattices of square four-point patterns. The common side length of the quadrats and of the four-point patterns varies from 0.1 to 0.9. The resulting MSE values are plotted in Fig. 7. These results give a new illustration of Smit's paradox: point counting performs better than area measurement for sufficiently small windows.
Finally, let us consider sampling schemes based on segments. The approximation in Eq. (7) can be used in order to compute the optimal segment orientation. As expected, the optimal orientation is given by the diagonal of the sampling point lattice tile. The worst orientation corresponds to a segment parallel to one of the sides of the lattice tile. Table 3 shows the maximal and minimal approximate MSE values for various segment lengths, for both the square and the hexagonal unit lattices.
Table 3. Approximate MSE values for the best and worst segment orientations. The segment length is varying from 0.1 to 0.9.
Orientation in degrees
Length segment l
In this article, we provide approximation formulae for the MSE of several stereological area predictors. These approximations converge when the sampling density tends to infinity. Simulations show that the convergence is faster for special structures with a simple shape. The MSE approximations depend only on the boundary length. As shown in the example in section 5, the MSE approximation may perform poorly when the structure presents strong spatial autocorrelations. In such a case, one may try to derive the MSE from a continuous geometric covariogram model. However, it is still an open problem how to characterize the family of geometric covariograms (Matheron, 1993).
The general MSE approximation in Eq. (7) holds when the boundary of the spatial structure is isotropically orientated. The simulations show that the approximation performs well when the anisotropy is not too pronounced and the lattice of figures is hexagonal. In the case of strong anisotropy, the lattice of figures should be randomly orientated. Also note that Eq. (9) extends to the anisotropic case. The extended formula involves the rose of normal directions to the boundary; see Kiêu & Mora (2004) for the asymptotic approximation of the spectral density in the anisotropic case.
The MSE approximation formulae can be used to assess the precision of stereological area predictions, provided that the mean boundary length is available. In practice, both area and boundary length can be predicted by combining point and line segment sampling; see, for example, Gundersen & Jensen (1987).
Owing to their simple structure, the MSE formulae can be used in sampling design. In particular, sampling parameters such as the spacing between figures or the number of points per point pattern can be computed such that the sampling scheme achieves an expected coefficient of error.
Furthermore, it is possible to compare sampling schemes independently of the structure under investigation. As shown in section 6, when the window (field of vision) is small compared to the lattice tile, the pattern of the four corner points performs better than other probes. Using more points or measuring the area inside the window does not improve the precision of area prediction.
The extension to volume prediction may seem straightforward. The general convergence result for the spectral density of a random compact set given in Kiêu & Mora (2004) holds in spaces with arbitrary dimensions. A set of simple MSE formulae for volume predictors is available in Kiêu & Mora (2005). However, in practice, sampling of three-dimensional structures involves complex nested sampling schemes. In general, top-level stages involve physical sectioning, and physical slabs or blocks are sampled independently. Hence, the whole sampling scheme combines stratified and systematic sampling. MSE formulae for this type of sampling scheme are not yet available, but they can be derived from existing mathematical tools.
The approximation formulae in section 4.1 are based on a convergence result for the spectral density of X. This convergence result holds under the regularity conditions given in Kiêu & Mora (2004). When the normal directions to the boundary of X are isotropically distributed, the asymptotic behaviour of the spectral density is given by
when ‖ y ‖ tends to infinity. The result in Eq. (20) can be extended to the anisotropic case by multiplying the right-hand side by r(ω), ω = ‖ y ‖ −1y, where r is the rose of unoriented normal directions to the boundary of X (r≡ 1/π for isotropically distributed normals).
A similar result has been obtained in the case where X is a deterministic convex body (Kendall, 1948). However, in this case an oscillating term of the order of ‖ y ‖ −3 must be added in the approximation of the spectral density.
Furthermore, the convergence of the spectral density is closely connected to a result due to Matheron (1971) concerning the behaviour of the geometric covariogram of X near the origin. For the sake of simplicity, we assume that the random compact set X is isotropic. Following Matheron (1971), p. 36, the geometric covariogram near the origin can be approximated by
The approximation (21) is proved in Matheron (1975) for deterministic convex bodies. The extension to random bodies is straightforward. Note that in view of Formula (21), the covariogram is not differentiable at the origin. Now let us assume that the geometric covariogram is continuously differentiable outside the origin. Then it is easy to prove by standard Fourier calculus that the convergence result in Eq. (20) holds for the spectral density.
Let us consider the case of a compact figure F2. By regularization, the spectral density of the measurement function
In the formula above, the ratio is scale invariant with respect to F2, and the spectral density of F2 is equal to the Fourier transform of its geometric covariogram. Hence the prediction MSE can be written as
Finally, the convergence result in Eq. (20) yields the approximation formula
Next consider the case of sampling by a lattice of lines. The spectral density of the graded measurement function
x → |X ∩ L + x |
is equal to the restriction of the spectral density PSDX to L⊥.
For a general figure F1, the spectral density of the regularized and graded measurement function can be expressed as
with y ∈ L⊥. MSE[Â] is equal to the sum of the spectral density of the measurement function over the dual lattice Λ*. Using the convergence result in Eq. (20), we get the approximation formula in Eq. (9).