# On error prediction in circular systematic sampling

## Authors

K. Ý. Jónsdóttir. Tel: +45 8949 4402; fax +45 8949 4400; e-mail: kyj@imf.au.dk

## Summary

An extended covariogram model is discussed for estimating the precision of circular systematic sampling. The extension is motivated by recent developments in shape analysis of featureless planar objects. Preliminary simulation results indicate that it is important to consider the extended covariogram model.

## Introduction

Recently, the precision of systematic sampling on the circle has been discussed in Gual-Arnau & Cruz-Orive (2000) and Cruz-Orive & Gual-Arnau (2002). In particular, variance estimation formulae based on a global polynomial model for the covariogram have been developed. In Hobolth & Jensen (2002), this approach is discussed both in a design-based and a model-based setting, and an alternative model-based method of estimating the parameter of the covariogram is described.

In this note, we summarize these developments and argue that it may be natural to consider an extension of the polynomial covariogram model, see also the discussion in Hobolth & Jensen (2002). We explain the geometric interpretation of the parameters of the proposed extended model and report preliminary simulation results.

## A global polynomial covariogram model

In this note, we will focus on the model-based approach to error prediction in circular systematic sampling. In a model-based setting, the aim is to predict an integral of the form

where

is a stationary periodic non-negative stochastic process of bounded variation, square integrable and piecewise continuous. As F is stationary, the mean ��F(2πt) does not depend on t and equals µ, say, and the covariance

(1)

0 ≤ h, t < 1, does not depend on h and will be denoted σ(t). In (1), we use a periodic extension of F. The covariance function satisfies σ(t) = σ(1 −t) and therefore, its Fourier expansion takes the form

0 ≤ t < 1.The predictor of Q to be considered is of the form

where φ∈[0, 1/n[. Note that the distribution of (F,φ,n) does not depend on φ, due to the stationarity of F. The prediction error of using (F,φ,n) as a predictor of Q can be expressed in terms of the Fourier coefficients λk of the covariance function. We thus have

(2)

Note that the prediction error only depends on Fourier coefficients of order n and higher.

In Hobolth & Jensen (2002), a parametric model for the covariance function is considered

(3)

where p is a positive integer and the other model parameters β0 and β are chosen such that λk 0 for k = 0, 1, ... . Under (3), the prediction error only depends on β

(4)

where B2pis a Bernoulli number. For more details on Bernoulli numbers and the associated Bernoulli polynomials, see Abramovitz & Stegun (1965).

Hobolth & Jensen (2002) suggest to estimate the parameter β using maximum likelihood estimation. It is shown that if F is assumed to be a Gaussian process, there exists a unique unbiased estimator of β with minimum variance. If we have n systematic observations of F:

φ∈[0,1/n[, then

where for j = 1, … , n– 1

(5)

Using this maximum likelihood estimate of β we can estimate the prediction error (4) by

(6)

The parametric model (3) for the covariance function has originally been suggested in a design-based setting by Gual-Arnau & Cruz-Orive (2000; page 635). They provide an alternative estimator of the prediction error (4) based on the empirical covariogram ĝ at 0 and 1/n

(7)

where B2p(t) is a Bernoulli polynomial of order 2p and

k = 0, 1, … , n – 1. Note that

(8)

It can be shown that for n = 2 and n = 3, (6) and (7) coincide.

The estimator (7) only uses the empirical covariogram ĝ near the origin. In Cruz-Orive & Gual-Arnau (2002), they suggest a modified estimator

(9)

using more values of the empirical covariogram. The estimators (6), (7) and (9) are all unbiased under the model (3).

## An extension of the global covariogram model

The model (3) and its design-based analogue have the nice property that analytic expressions of the estimators are available. It turns out, however, that it is natural from a geometric point of view to consider an extension of this model. In a model-based setting, the extended model is known as the p–order model (Hobolth et al., 2002; Hobolth et al., 2003). The covariance function of the extended model is determined by Fourier coefficients of the form

(10)

where  + > 0, > 0 and p >1/2. It can be shown that p determines the smoothness of the stochastic process F. In fact, if we assume that F is a Gaussian process, F is m − 1 times continuously differentiable where m is the integer satisfying p∈]m– 1/2, m+ 1/2]. For fixed p, and determine the global and local fluctuations of the stochastic process F, respectively. Small values of provide large fluctuations of the process on a global scale, while large values give smaller fluctuations. Also, the smaller , the more fluctuations of F on a local scale.

In particular, if F = R, where R is the radial function of a planar object K, star-shaped relative to zK,

0  t < 1, then p determines the smoothness of the boundary of the object K and for fixed p, and determine the global and local shape of the object, respectively. If is small, the global shape of the object K is expected to deviate from circular shape. A small value of is expected to provide an object boundary with many local fluctuations. Typically, in addition, the parameter λ1 is set to zero if the point z is approximately the centre of mass of the object K. For more details, see Hobolth et al. (2003).

As mentioned above the model described in (3) is a special case of the p-order model. It is obtained by choosing p as a positive integer and

(11)

It seems natural to include an additional parameter to allow for more flexibility.

The estimator (6) of the prediction error provided under the restricted model (11) appears, however, to work well under the general model (10) if n is large and F is a Gaussian process. A heuristic argument goes as follows. In Hobolth & Jensen (2002), it is shown for a Gaussian process with a general covariance function that

where is given in (5) and

For large n, we have under (10) that

Accordingly, the estimator (6) is approximately γnX where

and X∼χ2(n− 1)/(n − 1). Note that for n large, the prediction error is approximated by γn since

## Preliminary simulation results

Throughout this section, the squared radial function of the simulated objects relative to the origin is given by

(12)

0  t < 1, where all AkBkN(0, λk) are mutually independent. In fact, if F is a stationary periodic Gaussian process, F is distributed as in (12). Note that since F is the squared radial function,

is the area of the simulated object.

We concentrate on objects with z approximately equal to the centre of mass so λ1 is set to zero. The remaining Fourier coefficients follow (10). It is natural to use a reparametrization of the model (Hobolth et al., 2003),

(13)

such that α determines λk for small k≥ 2 while β determines λk for large k. Throughout the study, we use p = 2. As a consequence, the object boundary is continuously differentiable.

Figure 1 shows, for selected values of α and β, simulated objects and a log-log plot of the true prediction error (2) together with the estimated prediction errors (6), (7) and (9) calculated from measurements on the shown objects. As supported by the reasoning in Section 3, the estimate (6) seems to perform well for not too small values of n. The other estimates are somewhat below the true prediction error. For small values of n neither of the estimators delivers satisfying results. It should be emphasized that in Fig. 1 the simulated objects follow the model (13) but the estimators examined refer to the model (3).

In Hobolth et al. (2003), it is suggested to use the low frequency Fourier coefficients to estimate the parameters of the model (13). When F is a Gaussian process we can use the likelihood function

(14)

to find estimates of α, β and p through standard numerical methods. Here,

is the k'th phase amplitude, where ak and bk are the observed values of the Fourier coefficients Ak and Bk from (12). We used values of K approximately equal to n/3. For details about the choice of K, see Hobolth et al. (2003). In this paper, it is also shown that the estimation of p is not critical. For convenience, we used p = 2, but estimated α and β by maximizing L as a function of α and β. For this purpose, we used the fminsearch-function in Matlab which uses the simplex search method of Lagarias et al. (1998).

Since the prediction error is given by

(15)

and

for k > 2, we can use the estimated values of α and β to obtain an estimate of the prediction error. In the log-log plots of Fig. 2, we show the true prediction error (solid curve) as a function of n, together with the estimated prediction errors (6) and (7) and the prediction error obtained by inserting the estimated values of α and β into (15). The estimates are based on measurements on the shown objects. The estimate (7) lies somewhat below the other estimates which are quite close to the true prediction error. One should note, though, that the parameters in the model (13) are difficult to estimate if n is smaller then ten (Hobolth et al., 2003). The reason is that the Fourier coefficients ak and bk are determined as discretized Fourier integrals based on n measurements of the radial function.

All these results call for a closer investigation of the model (13) and its use in assessment of the precision of circular systematic sampling.

## Acknowledgements

We want to thank the referees for constructive comments on the manuscript. This research has been supported by the Danish Natural Science Research Council. Lars Michael Hoffmann has been supported by a Marie Curie Research Fellowship under contract number HPMT-CT-2001-0364.