On error prediction in circular systematic sampling

Authors


K. Ý. Jónsdóttir. Tel: +45 8949 4402; fax +45 8949 4400; e-mail: kyj@imf.au.dk

Summary

An extended covariogram model is discussed for estimating the precision of circular systematic sampling. The extension is motivated by recent developments in shape analysis of featureless planar objects. Preliminary simulation results indicate that it is important to consider the extended covariogram model.

Introduction

Recently, the precision of systematic sampling on the circle has been discussed in Gual-Arnau & Cruz-Orive (2000) and Cruz-Orive & Gual-Arnau (2002). In particular, variance estimation formulae based on a global polynomial model for the covariogram have been developed. In Hobolth & Jensen (2002), this approach is discussed both in a design-based and a model-based setting, and an alternative model-based method of estimating the parameter of the covariogram is described.

In this note, we summarize these developments and argue that it may be natural to consider an extension of the polynomial covariogram model, see also the discussion in Hobolth & Jensen (2002). We explain the geometric interpretation of the parameters of the proposed extended model and report preliminary simulation results.

A global polynomial covariogram model

In this note, we will focus on the model-based approach to error prediction in circular systematic sampling. In a model-based setting, the aim is to predict an integral of the form

image

where

image

is a stationary periodic non-negative stochastic process of bounded variation, square integrable and piecewise continuous. As F is stationary, the mean ��F(2πt) does not depend on t and equals µ, say, and the covariance

image(1)

0 ≤ h, t < 1, does not depend on h and will be denoted σ(t). In (1), we use a periodic extension of F. The covariance function satisfies σ(t) = σ(1 −t) and therefore, its Fourier expansion takes the form

image

0 ≤ t < 1.The predictor of Q to be considered is of the form

image

where φ∈[0, 1/n[. Note that the distribution of (F,φ,n) does not depend on φ, due to the stationarity of F. The prediction error of using (F,φ,n) as a predictor of Q can be expressed in terms of the Fourier coefficients λk of the covariance function. We thus have

image(2)

Note that the prediction error only depends on Fourier coefficients of order n and higher.

In Hobolth & Jensen (2002), a parametric model for the covariance function is considered

image(3)

where p is a positive integer and the other model parameters β0 and β are chosen such that λk 0 for k = 0, 1, ... . Under (3), the prediction error only depends on β

image(4)

where B2pis a Bernoulli number. For more details on Bernoulli numbers and the associated Bernoulli polynomials, see Abramovitz & Stegun (1965).

Hobolth & Jensen (2002) suggest to estimate the parameter β using maximum likelihood estimation. It is shown that if F is assumed to be a Gaussian process, there exists a unique unbiased estimator inline image of β with minimum variance. If we have n systematic observations of F:

image

φ∈[0,1/n[, then

image

where for j = 1, … , n– 1

image(5)

Using this maximum likelihood estimate of β we can estimate the prediction error (4) by

image(6)

The parametric model (3) for the covariance function has originally been suggested in a design-based setting by Gual-Arnau & Cruz-Orive (2000; page 635). They provide an alternative estimator of the prediction error (4) based on the empirical covariogram ĝ at 0 and 1/n

image(7)

where B2p(t) is a Bernoulli polynomial of order 2p and

image

k = 0, 1, … , n – 1. Note that

image(8)

It can be shown that for n = 2 and n = 3, (6) and (7) coincide.

The estimator (7) only uses the empirical covariogram ĝ near the origin. In Cruz-Orive & Gual-Arnau (2002), they suggest a modified estimator

image(9)

using more values of the empirical covariogram. The estimators (6), (7) and (9) are all unbiased under the model (3).

An extension of the global covariogram model

The model (3) and its design-based analogue have the nice property that analytic expressions of the estimators are available. It turns out, however, that it is natural from a geometric point of view to consider an extension of this model. In a model-based setting, the extended model is known as the p–order model (Hobolth et al., 2002; Hobolth et al., 2003). The covariance function of the extended model is determined by Fourier coefficients of the form

image(10)

where inline image + inline image> 0, inline image> 0 and p >1/2. It can be shown that p determines the smoothness of the stochastic process F. In fact, if we assume that F is a Gaussian process, F is m − 1 times continuously differentiable where m is the integer satisfying p∈]m– 1/2, m+ 1/2]. For fixed p, inline image and inline image determine the global and local fluctuations of the stochastic process F, respectively. Small values of inline image provide large fluctuations of the process on a global scale, while large values give smaller fluctuations. Also, the smaller inline image, the more fluctuations of F on a local scale.

In particular, if F = R, where R is the radial function of a planar object K, star-shaped relative to zK,

image

0  t < 1, then p determines the smoothness of the boundary of the object K and for fixed p, inline image and inline image determine the global and local shape of the object, respectively. If inline image is small, the global shape of the object K is expected to deviate from circular shape. A small value of inline image is expected to provide an object boundary with many local fluctuations. Typically, in addition, the parameter λ1 is set to zero if the point z is approximately the centre of mass of the object K. For more details, see Hobolth et al. (2003).

As mentioned above the model described in (3) is a special case of the p-order model. It is obtained by choosing p as a positive integer and

image(11)

It seems natural to include an additional parameter inline image to allow for more flexibility.

The estimator (6) of the prediction error provided under the restricted model (11) appears, however, to work well under the general model (10) if n is large and F is a Gaussian process. A heuristic argument goes as follows. In Hobolth & Jensen (2002), it is shown for a Gaussian process with a general covariance function that

image

where inline image is given in (5) and

image

For large n, we have under (10) that

image

Accordingly, the estimator (6) is approximately γnX where

image

and X∼χ2(n− 1)/(n − 1). Note that for n large, the prediction error is approximated by γn since

image

Preliminary simulation results

Throughout this section, the squared radial function of the simulated objects relative to the origin is given by

image(12)

0  t < 1, where all AkBkN(0, λk) are mutually independent. In fact, if F is a stationary periodic Gaussian process, F is distributed as in (12). Note that since F is the squared radial function,

image

is the area of the simulated object.

We concentrate on objects with z approximately equal to the centre of mass so λ1 is set to zero. The remaining Fourier coefficients follow (10). It is natural to use a reparametrization of the model (Hobolth et al., 2003),

image(13)

such that α determines λk for small k≥ 2 while β determines λk for large k. Throughout the study, we use p = 2. As a consequence, the object boundary is continuously differentiable.

Figure 1 shows, for selected values of α and β, simulated objects and a log-log plot of the true prediction error (2) together with the estimated prediction errors (6), (7) and (9) calculated from measurements on the shown objects. As supported by the reasoning in Section 3, the estimate (6) seems to perform well for not too small values of n. The other estimates are somewhat below the true prediction error. For small values of n neither of the estimators delivers satisfying results. It should be emphasized that in Fig. 1 the simulated objects follow the model (13) but the estimators examined refer to the model (3).

Figure 1.

Simulations under the model (13) with (log α, log β) = (3.5, –0.5) (7.5, −0.5) and (7.5, 3.5) in the upper, middle and lower row, respectively. The true prediction error (solid curve) is shown in a log-log plot as a function of n, together with the estimated prediction errors (6), (7) and (9) shown as ◊, ⋆ and ○, respectively. The simulated objects are shown in the lower left corners of the plots.

In Hobolth et al. (2003), it is suggested to use the low frequency Fourier coefficients to estimate the parameters of the model (13). When F is a Gaussian process we can use the likelihood function

image(14)

to find estimates of α, β and p through standard numerical methods. Here,

image

is the k'th phase amplitude, where ak and bk are the observed values of the Fourier coefficients Ak and Bk from (12). We used values of K approximately equal to n/3. For details about the choice of K, see Hobolth et al. (2003). In this paper, it is also shown that the estimation of p is not critical. For convenience, we used p = 2, but estimated α and β by maximizing L as a function of α and β. For this purpose, we used the fminsearch-function in Matlab which uses the simplex search method of Lagarias et al. (1998).

Since the prediction error is given by

image(15)

and

image

for k > 2, we can use the estimated values of α and β to obtain an estimate of the prediction error. In the log-log plots of Fig. 2, we show the true prediction error (solid curve) as a function of n, together with the estimated prediction errors (6) and (7) and the prediction error obtained by inserting the estimated values of α and β into (15). The estimates are based on measurements on the shown objects. The estimate (7) lies somewhat below the other estimates which are quite close to the true prediction error. One should note, though, that the parameters in the model (13) are difficult to estimate if n is smaller then ten (Hobolth et al., 2003). The reason is that the Fourier coefficients ak and bk are determined as discretized Fourier integrals based on n measurements of the radial function.

Figure 2.

Simulations under the model (13) with (log α, log β) = (3.5, –0.5) (7.5, –0.5) and (7.5, 3.5) in the upper, middle and lower row, respectively. In the log-log plots, the true prediction error (solid curve) is shown as a function of n, together with the estimated prediction errors (6) and (7), shown as ◊ and ○, and the estimate obtained by plugging in estimates of a and 0 into (15), shown as ⋆. The simulated objects are shown in the lower left corners of the plots.

All these results call for a closer investigation of the model (13) and its use in assessment of the precision of circular systematic sampling.

Acknowledgements

We want to thank the referees for constructive comments on the manuscript. This research has been supported by the Danish Natural Science Research Council. Lars Michael Hoffmann has been supported by a Marie Curie Research Fellowship under contract number HPMT-CT-2001-0364.

Ancillary