The methods of statistical process control (e.g. Montgomery (2009) and Oakland (2008)) have a long history of application to problems in public health surveillance (Woodall, 2006). Several proposed approaches for the on-line detection of outbreaks of infectious diseases are directly inspired by, or related to, methods of statistical process control. This is not surprising because the problem of detecting unusual clusters of diseases in epidemiological data prospectively is similar to that of detecting aberrances in industrial production processes as they arise. The main tools for tracking the characteristics of a production process over time are control charts. These are discussed in Section 4.1. In Section 4.2 and Section 4.3, further methods are considered which share a flavour of the statistical process control methodology, namely temporal scan statistics and methods based on the time to failure.

#### 4.1. Control charts

The first control chart was proposed by Shewart (1931) (see Section 2). The Shewhart chart utilizes information about only the last time point. Later, Page (1954) and Roberts (1959) derived control charts with memory: the cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) control chart respectively. To start with the former, let {*y*_{t},*t*=1,2,…} denote the time series of the counts being monitored. Assuming that , the one-sided (standardized) Gaussian CUSUM at time *t* is defined iteratively by

- (7)

where *C*_{0}=0 and *k*>0 is a constant that depends on the size of aberration of interest. It is often chosen to be (Rogerson and Yamada, 2004a). The baselines *μ*_{t} can be calculated from counts in comparable periods in previous years. These counts are also used to estimate the standard deviation *σ*_{t}. In the absence of any systematic departure from the expected values *μ*_{t}, equation (7) tends to remain at or close to 0. If *C*_{t}>*h*, where *h* is a specified threshold value, the process is declared to be out of control. Usually, the CUSUM is then reset to 0 and the process starts again. There are many variants of this basic procedure, though. For example, one system restarts the process with the CUSUM set to half the alerting threshold, to increase sensitivity to early signals (Lucas and Crosier, 2000). Methods based on the CUSUM formula are implemented in the early aberration reporting system of the Centers for Disease Control and Prevention, which is used throughout the USA as a syndromic surveillance system (Hutwagner *et al.*, 2003).

Fig. 3(b) shows the CUSUM for *Salmonella enteritidis* in England, Wales and Northern Ireland in the year 2009. An outbreak from the 28th week onwards is detected by the CUSUM (7). The values in Fig. 3(a) are the weekly counts from the previous years 2000–2008 which were used to calculate *μ*_{t} and *σ*_{t}. The threshold *h* was chosen on the basis of a predetermined acceptable value for the in-control average run length ARL_{0}, i.e. the average time between alerts when there is no outbreak. The reciprocal of ARL_{0} is the false positive (or false discovery) rate, i.e. the proportion of apparently aberrant reports that are not associated with outbreaks. Tables that can be used to find the value of *h* that is associated with chosen values of ARL_{0} and *k* are available (see for instance Rogerson (2001)).

In the case of rare events, the CUSUM approach (7) is not adequate, since the counts do not have a normal distribution. One remedy is to use the Poisson CUSUM (Lucas, 1985). Other methods that are used in disease surveillance to detect an increase in the mean of a Poisson distribution include, for example, the short memory scheme of Shore and Quade (1989), which is based on the distribution of cases in the current and previous periods. Kenett and Pollak (1996) used the Shiryaev–Roberts statistic (Shiryaev, 1963; Roberts, 1966) and applied it to a non-homogeneous Poisson process. Whereas Gaussian or Poisson CUSUMs are designed to analyse counted data, binomial CUSUMs (e.g. Reynolds and Stoumbos (2000)) can be used to monitor proportions.

Because CUSUMs are sensitive to small sustained changes in the mean numbers of reports, they are well suited to detecting relatively long lasting epidemics, such as influenza. However, for the same reason, they are sensitive to small changes in reporting efficiency and other artefacts of the reporting process. Thus, they may lack robustness when used with surveillance data unless the baselines are frequently reset.

The EWMA control chart gives less weight to more historical data. The EWMA is defined by the recursive equation

- (8)

where *z*_{0}=0 and the weight parameter *γ* ∈ (0,1]. The weighting for each older data point decreases exponentially, giving much more importance to recent observations, while not discarding older observations entirely. For *γ*=1, equation (8) is the same as the method by Shewhart (1931). The asymptotic (one-sided) variant of the EWMA chart will give an alarm at

- (9)

where *L*>0 is a constant and *σ*_{z} is the asymptotic standard deviation of *z*_{t} (Sonesson, 2003). Alternatively, one can use the exact standard deviation (which is increasing in time) instead of the asymptotic of the alarm limit (9). For the EWMA chart, Elbert and Burkom (2009) and Burkom *et al.* (2007) proposed the Holt–Winters technique for generalized exponential smoothing (see Section 3.1) to account for trends and seasonal features in syndromic data. Dong *et al.* (2008) constructed three types of EWMA methods that do not require an assumption of identical distributions of the counts to detect a positive shift in the rate of incidence. Adaptions of the EWMA method for Poisson and binomial data are available (Borror *et al.*, 1998; Gan, 1991). Using the exponential smoothing technique and properties of numerical derivatives, Nobre and Stroup (1994) developed a method which bases monitoring on changes in the numerical gradient of the variable under surveillance with respect to time. Höhle and Paul (2008) presented count data regression charts, which accommodate seasonal variation in the mean of the infectious disease counts. Assume that the observed counts originate from a negative binomial distribution parameterized by its mean *μ* and dispersion parameter *θ*. For *θ*0 the Poisson distribution with mean *μ* is obtained. For the in-control situation *y*_{t}∼NegBin(*μ*_{0,t},*θ*), where

- (10)

and *c*(*t*) is a cyclic function that may be modelled, for example, by trigonometric terms (Serfling, 1963), i.e. the in-control mean is assumed to be time varying and linear on the log-scale. The out-of-control situation is characterized by a multiplicative shift *μ*_{1,t}=*μ*_{0,t} exp (*κ*) with *κ*≥0, which corresponds to an additive increase of the mean of the log-scale. It is assumed that the in-control parameters are known, whereas *κ* is unknown and is estimated via maximum likelihood. A generalized likelihood ratio statistic is computed to detect, on line, whether a shift in the intercept occurred. Extensions of the basic seasonal count data regression chart are available that take account of auto-correlation between observations (Höhle and Paul, 2008) or the population size of the age strata (Höhle and Mazick, 2010). Other modified CUSUM methods that allow for time varying Poisson means were proposed by Rossi *et al.* (1999) and Rogerson and Yamada (2004b).

The use of control charts has also been widely advocated for the surveillance of healthcare-associated infections (Benneyan, 1998a,b; Woodall, 2006; Carey, 2003; Limaye *et al.*, 2008). In this context, CUSUM charts are more frequently useful than EWMA charts (Woodall, 2006), but Shewhart charts appear to be the charts that have found greatest application, often being used to show the proportion of incidents in fixed periods of time. For example, they have been used in this way to monitor anaesthesia-related adverse events (Fasting and Gisvold, 2003) and risk-adjusted mortality rates of patients in hospital following admission for acute myocardial infarction (Coory *et al.*, 2008). Morton *et al.* (2001) considered the application of Shewhart, CUSUM and EWMA charts for continuous realtime monitoring of various hospital acquired infections, such as vascular surgical site infection and *Klebsiella pneumoniae*. It was concluded that Shewhart and EWMA charts are together ideal for monitoring *bacteraemia* and multiresistant organism rates, whereas Shewhart and CUSUM charts together are suitable for surgical infection surveillance.

#### 4.2. Temporal scan statistics

Scan statistics (e.g. Glaz *et al.* (2001)) can be used to detect and evaluate clusters of disease cases in either a purely temporal, purely spatial or space–time setting (Woodall *et al.*, 2008). In a temporal setting, this is usually done by gradually scanning a window across time, noting the number of observed and expected observations inside the interval. The scan statistic has long been used for retrospective detection of temporal clusters in epidemiology (Wallenstein, 1980). Kulldorff (2001), Ismail *et al.* (2003) and Naus and Wallenstein (2006) adapted the scan statistic for use in prospective temporal surveillance.

There are two general types of prospective temporal scan-based methods. One type involves counting the number of incidences in a single region in the most recent time period (or window) of a fixed length (Ismail *et al.*, 2003; Naus and Wallenstein, 2006). Let *y*_{n} denote the observation at the current time point *n* and let *L* be the fixed window size. The scan statistic can be viewed as an unweighted moving sum (Han *et al.*, 2010; Joner *et al.*, 2008):

- (11)

An alert is flagged as soon as equation (11) exceeds a threshold *h*, i.e. the first time that *S*_{n}>*h*, where *h* is typically chosen in conjunction with an acceptable value of ARL_{0}, although choosing *h* so that the type I error is a predetermined value *α* has also been suggested (Naus and Wallenstein, 2006).

In the prospective temporal scan method of Kulldorff (2001), the length of the window is not a constant but varies over a range of values (see also Wallenstein and Naus (2004)). Since the temporal scan statistic by Kulldorff (2001) can be viewed as a special case of his spatiotemporal procedure, a discussion of this method is deferred till Section 5. Public health surveillance data are often non-stationary with seasonal and other effects that are seldom found in industrial process control data. Wallenstein and Naus (2004) proposed a temporal scan method that can account for seasonal effects.

#### 4.3. Methods based on interevent times

Methods which base detection on total reports will fail when events are very rare, because even a single report will then be unusual in a statistical sense. In such cases, one might either specify a minimum size of outbreak that must be exceeded for the count to qualify as an aberration (Farrington *et al.*, 1996), impose a lower bound on the standard error used to normalize residuals or alternatively use the ‘sets monitoring technique’ (Chen, 1978), which bases detection on the time intervals between reports (see Farrington and Andrews (2004) for a brief review of this methodology). Sego *et al.* (2008) proposed the Bernoulli CUSUM chart for the surveillance of rare health events instead.

Other techniques based on the time to failure have been proposed, such as time between event (exponential) CUSUM or EWMA schemes (e.g. Gan (1994, 1998)). Exponential control charts arise naturally in the context of monitoring the rate of occurrence of rare events, since interevent times for a homogeneous Poisson process are exponentially distributed random variables.