Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models

In stepped cluster designs the intervention is introduced into some (or all) clusters at different times and persists until the end of the study. Instances include traditional parallel cluster designs and the more recent stepped‐wedge designs. We consider the precision offered by such designs under mixed‐effects models with fixed time and random subject and cluster effects (including interactions with time), and explore the optimal choice of uptake times. The results apply both to cross‐sectional studies where new subjects are observed at each time‐point, and longitudinal studies with repeat observations on the same subjects. The efficiency of the design is expressed in terms of a ‘cluster‐mean correlation’ which carries information about the dependency‐structure of the data, and two design coefficients which reflect the pattern of uptake‐times. In cross‐sectional studies the cluster‐mean correlation combines information about the cluster‐size and the intra‐cluster correlation coefficient. A formula is given for the ‘design effect’ in both cross‐sectional and longitudinal studies. An algorithm for optimising the choice of uptake times is described and specific results obtained for the best balanced stepped designs. In large studies we show that the best design is a hybrid mixture of parallel and stepped‐wedge components, with the proportion of stepped wedge clusters equal to the cluster‐mean correlation. The impact of prior uncertainty in the cluster‐mean correlation is considered by simulation. Some specific hybrid designs are proposed for consideration when the cluster‐mean correlation cannot be reliably estimated, using a minimax principle to ensure acceptable performance across the whole range of unknown values. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.


Introduction
In a 'stepped' cluster design, an intervention is introduced into some or all of the clusters at (possibly) different uptake-times during the study. Once introduced into a cluster, the intervention persists until the end of the study. We assume that outcome-data is collected in all clusters throughout the duration of study and that interest centres on the contrast between the treated (post-intervention) and control (pre-intervention) conditions. This class of study design includes some traditional parallel cluster designs [1] with ongoing patient recruitment (where one group of clusters receives the intervention at the start, and the remainder at the end, or not at all) as well as the recent 'stepped-wedge' designs in which the intervention is introduced into all clusters but at staggered (often regularly spaced) time-points [2][3][4][5]. Some other examples feature in Figure 1. Stepped-wedge designs offer a potential advantage over parallel designs in that each cluster functions as its own control. This can be useful when a high proportion of the variation occurs at the cluster level, but it must be set against the temporal confounding that may arise because the number of clusters exposed to the intervention increases over the course of the study. The tension between cluster-level effects and temporal confounding is a critical factor for the performance of stepped designs [6].
We suppose that a study for a cluster-level intervention is planned to take place in K clusters and that each cluster contributes m outcome measurements (observations) at each of T regularly-spaced time-points. The discussion here applies to two main types of cluster study: (i) longitudinal (cohort) studies where Km subjects are recruited at the beginning of the study, with repeat observations at each timepoint; and (ii) cross-sectional studies where KTm different subjects are recruited during the course of the study, and each contributes a single observation on the main outcome measure. We exclude 'open' cohort designsi.e. longitudinal studies with significant drop-out and/or ongoing recruitment. Instances of these two types are provided by two paradigms: (i) studies of a health intervention in a closed population (such as a care-home) where the same subjects are monitored over a period of time [7,8]; and (ii) service-delivery studies in hospital in-patient or emergency departments where a single outcome measure is obtained from each member of a continuously changing patient population [9,10]. Further instances of each type are enumerated in a recent review [4]. In either case we assume that the frequency and timing of data-collection are already settled, and focus on the choice of intervention uptake-times: specifically the impact of this choice on the statistical performance of the intervention effect estimate.
The observations are described by a linear mixed effects model with fixed time and random cluster effects due to Hussey and Hughes [11], with the addition of random cluster-specific time components and subject-level components for longitudinal studies [12]. Even without these additions, the Hussey and Hughes model has been found useful for the design of stepped studies [13,14]. It is directly applicable when the outcome is continuous, and approximately for binary outcomes when m is large [11]. It was used also in a recent paper that addresses the optimal design of stepped-wedge studies [15], although under conditions that exclude most of the optimal designs in the present study.
Some designs using eight clusters are illustrated in Figure 1. In each particular design the number of observation-times in each cell is constant, with treatment status coded as 1 if exposed and 0 if unexposed to the intervention. The total number of observations in each cluster (mT = M) is the same for all designs. This means that the cell-sizes may differ between designs. Thus a cell in Figure 1b the parallel designcontains twice as many observations as a cell in Figure 1a.
In most of these examples the number of treated clusters grows over the course of the study. An exception is the cross-over design in Figure 1(a). This design is often not practical because the intervention is withdrawn in some clusters halfway through the study. However it is useful as a reference design Figure 1. Schematic for some designs with eight clusters. The horizontal direction represents time and the total duration of the study (which is the same for each design) is equally divided between the columns. The rows represent clusters. So each cell represents the observation-times in a single cluster over one fixed time period. Treated cells are denoted by 1, and Controls by 0.
against which the relative efficiency of other designs can be calibrated. The 'Delay-Control Design'a terminology suggested by an anonymous refereein Figure 1c is an elaboration of the parallel design in which all clusters contribute baseline control observations and all receive the intervention towards the end of the study. A stepped-wedge design arises if all clusters receive the intervention with uptake times evenly spaced throughout the study. The stepped-wedge in Figure 1d has eight clusters but only g = 4 uptake times. Alternatives with g = 2 or 8 are also available. The final design (Figure 1e) is motivated by an optimality result described later. It represents a 50:50 hybrid combination of a parallel design in the top and bottom two clusters with a stepped-wedge-type layout in the middle four clusters, although with reduced time before the first and after the last uptake point. The paper contains two main sections. In section 2 a generic expression for the precision of the treatment effect estimate is obtained. This leads to a simple graphical method for comparing the efficiency of designs. Section 3 is concerned with the optimal choice of uptake times when the numbers of clusters and observation time-points are both fixed, and includes some suggestions for practical design choices.
2. The precision of the effect estimate under a linear mixed effects model

The model
Each cluster i (i = 1,…,K) generates m measurements at every time-point j (j = 1,…,T) according to the following model: Here Y ′ ijl has variance σ 2 1 and is the outcome for the lth observation (l = 1,…,m) at time j in cluster i; c 1 , …,c K are mutually independent random cluster effects (variance ¼ η C σ 2 1 ); t 1 ,…,t T are fixed time-effects; s l(i) and (st) l(i)j are within-cluster random effects with variances η S σ 2 1 and η ST σ 2 1 respectively; (ct) ij represents cluster-level fluctuations of the time-effect with variance η CT σ 2 1 so that η CT = 1 À η C À η S À η ST ; θ is the intervention effect whose estimation is the purpose of the analysis; and J ij is a binary variable (as in Figure 1) which indicates whether or not the intervention is present at time j in cluster i. The random components in this model are as in Teerenstra et al. [12]. Note that we use M = Tm to denote the total number of observations in each cluster. The number of time parameters is dictated by the number of observation times (T), and this may be greater than the number of columns in the representation used in Figure 1. However, the main precision formula below applies also for models with a single timeparameter for each column.
This formulation covers two important cases: 1. Cross-sectional study with no within-cluster time effects: η S = 0, η CT = 0. At each time-point m (new) subjects are observed within each cluster. The (st) component combines variation between subjects (within clusters) and within-subject measurement error. Here η C is the conventional intra-cluster correlation coefficient (ICC).
The Y ij s obey a model 2 of the form proposed by Hussey and Hughes [11] and which is the basis of our development. This is: where , are independent random components. The variance of Y ij can be partitioned as The quantity ρ is the correlation between Y ij -values obtained in the same cluster at different times, and corresponds to the ICC for the Y ij s. Unlike ICCs for the individual observations Y ′ ijl , ρ can easily be large (i.e. close to 1), particularly if η CT = 0.

The cluster-mean correlation
The efficiency of a stepped cluster design turns on the quantity R, defined byR T; . The value of R has a simple interpretation in terms of the Y ij as the proportion of the variance of a cluster-mean Þthat comes from random effects that are independent of time (i.e. from the c i and s l(i) ). In many circumstances this definition is aligned with that for the correlation between the means of two replicate sets of observationssay Y 1 ð Þ i• ; Y 2 ð Þ i•from the same cluster i, although the nature of the replication must be carefully considered. For a cross-sectional study with η S = 0, η CT = 0 the replicate set simply entails the recruitment of a new set of subjects in the same cluster. Moreover, in this case, R T; , furnishing a direct connection with η C , the ICC for the individual observations. A similar interpretation in longitudinal studies is possible if the 'replicate set' is taken to require new observations at the same times on the same subjects rather than a set of measurements on new subjects.
In all cases we refer to R as the cluster-mean correlation (CMC). In general 0 ≤ R ≤ 1 and large values of R (i.e. close to 1) are possible even in cross-sectional studies with a small ICC (η C ) for the individual observations. This feature is illustrated in Figure 2.

A precision formula
The performance of the design can be identified with the precision of the best linear unbiased estimate (BLUE) of the treatment effect θ in model 1, with known variances. The BLUE under model 1 is also the BLUE under the Hussey and Hughes model 2, so that the following result can be applied: (Precision of the effect estimate) Under the model in 2 the BLUE of θ has precision given by: where Þρ is the CMC and a D and b D are constants, specific to the particular design D, that depend on the distribution of the treatment indicator J ij over the units in the study.
Specifically we have making use of the 'dot-bar' notation, so that J i• means 1 In the language of the analysis of variance, a D is the 'within-columns' variance of the binary variable J ij, and b D is the 'between-rows' variance of J ij , under a probability model in which equal weight is assigned to each point of the (i,j) lattice. The result 3 is established in Appendix A under the conditions of model 2, and provides a straightforward method to calculate the precision for any configuration of J ij s.
The argument in appendix also shows that the precision expression 3 holds for a model with linear constraints on the t j parameters, provided that the resulting model is rich enough to represent an initial level, together with a separate time-effect for each uptake point. If the design is represented as in Figure  1, the minimum requirement is a separate time parameter for each column.
(Extension to fixed cluster effects). If the cluster effects (γ i ) are treated as fixed, the precision of the estimate of θ is recovered by setting R(T,ρ) = 1 in expression 3.
The result 3 is formally equivalent to an expression given by Hussey and Hughes [11]. However, these authors do not expose the important relationship with the CMC.

Precision for some particular designs
The design coefficients a D and b D in 3 are presented for some particular designs in Table I below. The calculations are outlined in Appendix B. In some cases these expressions apply strictly only if certain divisibility conditions on T and K are met. For example, the results for the g-step stepped-wedge design require that T be a multiple of (g + 1), and K a multiple of g. Where a required condition does not hold, the results may be regarded as approximations whose validity improves in larger studies.
Cluster Cross-Over design (CXO): For this design it is assumed that both K and T are even. Initially there are K/2 intervention clusters and K/2 control clusters. Halfway through the study each cluster crosses over to the alternative condition.
Parallel Design (PD): This is a classic design for a cluster-randomised trial. An even number of clusters is assumed, equally divided between treated and control conditions. Delay-Control Design (DCD p,q,r ): A parallel layout runs only for a proportion q of the study duration, preceded by a period (proportion p) in which all clusters remain in the control condition, and followed by a period (proportion r) in which the treatment is present in all clusters (p + q + r = 1). The PD (= DCD 0,1,0 ) Table I. Design coefficients for some selected designs. These determine the precision using equation 3 and the design effect using equation 5. The final column represents the asymptotic relative efficiency (ARE) of the design (compared to CXO) when the number of observations is large.  Stepped-Wedge (SW g ) Figure 1d 2 (middle 4 clusters) 2 3 1 À 1 A. J. GIRLING AND K. HEMMING and the ANCOVA design [12] (= DCD 0.5,0.5,0 ) are covered as special cases. The design coefficients (a D , b D ) depend only on the proportion of time (q) occupied by the parallel layout. For both the PD and the DCD it is possible to vary the proportion of control clusters within the parallel layout from the 50% assumed here. The effect would be to reduce both design coefficientsand hence the precisionby the factor 4s(1 À s) ≤ 1 where s is the (new) proportion of control clusters.
Stepped-Wedge design with g steps (SW g ): This is the 'standard' stepped-wedge design. The clusters are divided into g equal groups. Initially all clusters are untreated. The uptake time is the same for all clusters in any one group. In the first group this occurs when a fraction of the total study time equal to 1/(g + 1) has elapsed and T/(g + 1) observation time-points have passed. The uptake times in the remaining groups occur at time-fractions 2/(g + 1), 3/(g + 1),…, etc. until all clusters are exposed. During the final part of the study (from time-fraction g/(g + 1) to the end) all clusters are exposed to the intervention.
Modified SW design (MSW g ): In this modification of the SW g design the time period before the first (and after the last) uptake point is equal to one half of the time between consecutive uptake pointsas in the middle four clusters in Figure 1e. Because a MSW À b MSW R > a SW À b SW R for all values of R, the MSW g design generates greater precision than the conventional SW g design, although the relative advantage diminishes as g increases.
Stepped-Wedge/Parallel Hybrid designs ( β H g ): These designs achieve the maximum precision over stepped designs with overall balance between treated and control observations, a result established below in section 3. In the β H g hybrid, Kβ clusters are assigned to a modified stepped-wedge layout with g uptake points, and the remaining K(1 À β) clusters to a concurrent parallel layout.

Precision-ratio plots
Using the CXO as a reference design, the efficiency of any stepped design, D, may be defined as a precision ratio from 3: The performance of the design when T is large is captured by the asymptotic relative efficiency (ARE) as R → 1 (= 4a D À 4b D ) as given in the final column of Table I. The fact that this is zero for the PD reflects the generally poor performance of parallel cluster trials with large cluster sizes [16]. The other designs all have non-zero AREs so that there is no upper limit to the precision that can be obtained by increasing the number of observations in each cluster. Figure 3 shows the precision-ratio as a function of the CMC, R, for the designs in Figure 1. When R is small, the PD outperforms the other designs, but is the worst design when R = 1. The SW 4 design performs relatively well for large R but is dominated by MSW 4 . Similarly DCD with q = 2/3 is dominated by 0.5 H 4 . Over the range 3 11 ≤ R≤ 9 11 = the 0.5 H 4 design gives the best precision among these designs. Outside this range PD (for smaller R) or MSW 4 (larger R) are best. If the value of R is uncertain the 0.5 H 4 design might be preferred as a compromise design, at least on statistical grounds, because it performs reasonably well throughout the range.
This approach can inform the choice between any two candidate designs: for example, that between a stepped-wedge and parallel design [14,[16][17][18][19][20][21][22]. From 4 the stepped-wedge is the more efficient only if R > r 0 where the threshold r 0 satisfies This implies that r 0 ¼ 1 2 1 þ 1 g for a standard SW g design, and r 0 ¼ 1 2 1 þ 3 1þ2g 2 if an MSW g is used. In general, a SW designof either typecan be better than a PD only if the cluster-mean correlation R exceeds ½. In a cross-sectional study, where the ICC of a single observation (i.e. η C ) is generally small, this translates into a useful rule of thumb: to prefer the SW option only if the ICC is greater than 1/M.

Design effects
A design effect is the ratio of the precision of the treatment effect estimate in an individually randomised trial (RCT) to that in the cluster trial, and is convenient for sample size calculations. For a CXO design under model 1 the components c i and s l(i) cancel out in the treatment-effect estimate and the design effect is just mη CT + η ST . It follows that the design effect for a general design D is given by For a parallel cross-sectional (PD) design with M observations per cluster and no cluster-level time effects (η S = η CT = 0) the formula 5 becomes (1 À η C )/(1 À R) and yields the well-known design effect [1 + (M À 1)η C ] [1]. Published stepped-wedge design effects [14] can also be obtained in this way.

Finding the best design
In many traditional designs the intervention is introduced (into some of the clusters) at a single timepoint in the study. This is not true of stepped-wedge designs for which a number (g) of different uptake times will be required. Normally these are equally spaced in time, yet they can be chosen so as to optimise the performance of the design. To this end, we assume that the study will involve T observation times within each of K clusters with observations Y ij generated by the model 2. The problem is to choose a configuration of uptake times in the clusters to achieve the best possible precision for the treatment effect estimate. Note that it is possible for an uptake time to occur before the start of the study (in a pure 'treated' cluster), or, notionally, after it has finished (in a pure 'control' cluster).
Mathematically we must find the values of the J ij s (= 0 or 1) that maximise the expression for Π in 3, subject to the irreversibility constraint that j > j′ ⇒ J ij ≥ J ij′ for all i. First suppose that the clusters have been numbered according to the order in which their uptake times occur (so that i < i′ ⇒ J ij ≥ J i′j for all j). This ordering has been implicitly assumed in the examples in Figure 1 above, with the cluster index i taken as increasing in the upwards direction and the time index j as increasing from left to right. Also it is convenient to map the design points (i,j) on to a lattice in the x-y plane by setting The x-y design lattice is contained in a unit square centred on the origin (see Figure 4). With these definitions an alternative expression to 3 for the precision is given by a result derived in Appendix C. Unlike 3 (which holds for any configuration of the J ij s) the expression 6 is valid in the presence of the irreversibility constraint, and where the clusters follow the suggested ordering.

An algorithm for optimal designs
Suppose that J • • (the total number of treated time-points over all clusters) is fixed. Then, in view of 6, the precision is maximised by nominating treated points one by one, beginning with (i, j) = (1, T) and in decreasing order of the quantity (Rx jy i ), until exactly J • • are included in the treated set. Any ambiguity in the ordering caused by tied values can be resolved arbitrarily because tied points contribute the same amount to the expression in 6. In geometrical terms this is achieved by moving a straight line of slope R upwards until exactly J • • points lie on or beneath it (as, for example, in Figure 4). In this way designs with the best precision are determined for each of the (TK À 1) possible values of J • • . The overall optimum can then be obtained by maximising the best precision over J • • , using 6. This algorithm is easily implemented in a spreadsheet programme and can assist with the design of practical studies.

Best balanced designs
Practical experience with this algorithm shows: (i) that the optimal design is not always unique; and (ii) that when the number of design points is an even number there may still be no optimal design with overall balance between the treatment armsi.e. for which J •• ¼ 1 2 TK. Notwithstanding (ii) it turns out that the design with greatest precision among those that satisfy the balance condition (the best balanced design, or BBD) is often approximately optimal, especially when both T and K are large.
The BBD can be determined from the algorithm above by setting J •• ¼ 1 2 TK.
(Best Balanced Design) Suppose that the number of design points, TK, is even. A design in which the treated points consist of those for which y i < Rx j together with half of the points (if there are any) for which y i = Rx j is a best balanced design. In such a design, the boundary between the treated and untreated lattice-points is a straight line through the origin with slope R. An example is shown in Figure 4. The first group of P clusters receive the intervention at the start of the study (i.e. they are 'Treated' clusters) where P is the label of the last cluster for which the point (x 1 ,y P ) lies on, or just below, the boundary line y = Rx. This condition translates into the inequalities: The last P clusters are 'control clusters' and do not receive the intervention at all. The K À 2P clusters in the middle participate in a genuine stepped study with equal intervals of time between the uptake points. In some cases the design for these middle clusters corresponds exactly to a MSW (K À 2P) design, though this requires that T = 2(K À 2P)l for some integer l, where l is the number of time-points in each cluster before the first uptake time (and 2l is the number between two successive uptake times). In the general case, the correspondence to the MSW design is only approximate.
This result shows that the best balanced design when T is large approximates to a hybrid design R H RK in which a stepped-wedge layout is followed in a proportion of the clusters equal to the CMC, R, and a parallel layout in the remaining clusters.

Examples of optimal design
Consider the design of a study in 10 clusters over six observation times. The design lattice contains 60 (=10 × 6) points. The observations within each cluster are either unique measurements on 6m separate subjects, or they could be six repeat measurements on the same m subjects. The value of m need not be separately specified because the statistical design issues depend only on the CMC, R.
Best balanced designs for this problem are shown in Figure 5 for the full range of R-values from 0 to 1. For each design the range of validity includes both end-points, so that, for instance, design (a)a parallel layoutand design (b) are both BBDs when R = 0.12. The BBDs are unique except at these isolated end-points. In a majority of cases the BBD is also an optimal design. Exceptions include design (b) for R = 0.2, and design (h) for R = 1. In each case conversion to optimality is achieved by modifying the treated set either to exclude the circled point or to include the boxed point. By repeated application of the optimality algorithm with R = 0(0.001)1 it was found that the BBD is optimal in 77.5% of cases, and achieved a relative efficiency of at least 98.83% for all values of R, with the worst case at R = 0.6. The mean relative efficiency of the BBDs was 99.92%a minimal loss of efficiency compared to the true optimum.
The case with R = 0.6 is complex: designs (d), (e), and (g) are just three out of 20 possible BBDs in this case. The other seventeen BBDs are obtained by choosing different sets of three treated points from the six design points along the diagonal indicated in Figure 5(g), whose slope (=0.6) exactly matches the value of R. Design (g) has been singled out here because it is an example of an exact hybrid design ( 0.6 H 3 ). However, none of these 20 BBDs is fully optimal when R = 0.6. An optimal design in this case omits all six diagonal points from the treated set.
All the examples shown include pure 'control' clusters and pure 'treated' clusters as part of the optimal design. This is a common feature of optimal stepped designs as defined here, but is specifically excluded in recent work by Lawrie et al. [15].

Admissible designs for large studies
When T and K are large, the precision for the best balanced design approximates to that of the overall optimal design, with an error of small order in the quantity min(T,K). This is shown in Appendix Dby means of an integral approximation to 6. It follows that the efficiency (relative to the CXO) of the overall optimal design for a large study is that of a hybrid design in which g is large (i.e. R H ∞ ). From Table I this is 1 À R þ 1 3 the hybrid designs, and generate tangents to the boundary curve. The parallel (R = 0) and stepped-wedge (R = 1) designs are both admissible. The region of the unit square in Figure 6 above the quadratic curve is not attainable by any stepped design.
Hybrid designs do not appear to have been widely used in practice. We are aware of only one example [23]. However they offer an appealing compromise between the parallel and stepped-wedge designs given that these can offer optimal precision only at the extreme ends of the range for the cluster-mean correlation (i.e. when R = 0 and 1, respectively). An appropriate hybrid design can improve the precision of the treatment estimate in all other cases.

Minimax designs
Where reliable information about the CMC is not available it makes sense to choose an admissible design that minimises the maximum possible loss of precision relative to the optimal design. For large studies this is a hybrid design e β H ∞ (the 'minimax hybrid') where e β, the proportion of SW clusters, is given by At the minimax solution the relative loss of precision is the same at both ends of the range for R (i.e. at R = 0 and R = 1) because of the convexity of the quadratic boundary curve in Figure 6. So e β satisfies  Table II.   Table II can be 'scaled up' to larger numbers of clusters (rows) and observation periods (columns) without the affecting the precision relative to the CXO design, which can depend only on β and g. The best performance in the table is achieved by design 4, illustrated in Figure 7. Note that the 50:50 Hybrid in Figure 1e achieves a worst-case relative precision (WRP) of 75%. Even the simplest five-cluster design in the table (design 1) achieves a WRP of nearly 83%. This design was encountered above (Figure 5(g)) as a BBD for the case K = 10, T = 6.

Design performance when R is uncertaina simulation study
In practice the likely performance of a chosen design depends on the degree of uncertainty in the prior estimate of the CMC. This was investigated using a class of prior distributions for R of the form: where R 0 is the median CMC value, and prior uncertainty is characterised by the variance parameter τ 2 .
Values of τ 2 were chosen to correspond to three different coefficients of variation of the quantity R/(1 À R): 0.25 (low uncertainty); 0.5 (medium uncertainty); and 1.0 (high uncertainty). In view of the for a cross-sectional study h i the prior is consistent with a similar level of uncertainty for ρ (or η C ). In particular, this analysis applies directly to the case of a cross-sectional study with uncertain ICC. Table III shows the results of a simulation study for the distribution of the relative precision of some particular design-choices compared to the BBD at the true value of R. Centiles from the distribution are displayed, alongside the WRP. The designs considered are: (i) BB 0the best balanced design at the median-estimate, R 0 ; the near-minimax hybrid design 6 H 3 ; (iii) Mmxthe true-minimax design for large studies (= 0.634 H ∞ ). Two cases are shown: (i) a 'small' study with 10 clusters and six observation times (for which the BBDs are shown in Figure 5); and (ii) the limiting case of large studies (T, K → ∞), for which the BBDs are hybrid designs.
The BB 0 designs show excellent relative precision for medium or low uncertainty about R: the median (cent50) relative precision is at least 99% and the 5 th centile as least 95% in all cases. Under higher levels of uncertainty there is some deterioration in performance, even in large studies. Even so the 5 th centile still exceeds 90% in all tabulated cases. In general, performance is somewhat better for values of R 0 towards the ends rather than the middle of the range.
Nevertheless the low values of the WRP for many of the BB 0 designs (it can be as low as 0 when R 0 is small) highlight the need for caution if the prior information is unreliable. In this case a more robust choice is the near-minimax design .6 H 3 . In the small study its WRP of 85.3% guarantees acceptable performance even when the CMC is very low. In large studies, it is inferior to the true minimax design   Table II.

Comments
The mixed effects model 1 is a vehicle to express a certain correlation-structure with additive fixed effects for time and treatment, including subject-level effects and variation within clusters over time. This structure could be further enrichedfor example by using time series models for the development of subject and cluster effects over time, or by using cluster by treatment interaction terms to model variation in treatment effects across clusters. In its current form the model expresses the tension between cluster-effects and time-effects for the design of cluster studies in a form susceptible to exact analysis. At the design stage, the correlations are assumed known and, following the strategy of Hussey and Hughes [11], we use the model to compute the precision of unbiased linear estimates of a fixed treatment effect. This approach is moment-based and makes no explicit distributional assumption, although it is clearly justifiable when the data are normally distributed with the prescribed correlation-structure. In some other cases, where central limiting arguments can be deployed, it approximates to the best Table III. Performance of best balanced and near-minimax designs under prior uncertainty for the CMC, R, (a) for 10 clusters with six observation times and (b) in the limiting case of large studies. The prior median is denoted by R 0 , and CV is the coefficient of variation of the quantity R/(1 À R). the designs considered are: BB 0 the BBD at R 0 ; .6 H 3a near-minimax hybrid design (design 1 in Table II); Mmx (= .634 H ∞ )the minimax design in large studies. Entries are the centiles (from 9999 simulations) of the relative precision of the stated design compared to the BBD at the true value of R. WRP is the relative precision at the least favourable value of R (=0 or 1 in all cases). approachfor example, for binary outcomes when m is large. The additivity requirement in the model can be relaxed within a generalised linear mixed model, with fixed and random effects on a transformed scale. Then our results may still apply approximately if these effects are relatively small, permitting a linear approximation on the natural scale. Several aspects of this work offer potential applications in the design of both cross-sectional cluster studies (where each subject provides a single outcome measurement) and of cohort designs (where repeated measures are taken on the same subjects). First, the expression for precision in terms of the CMC leads to a convenient graphical tool for comparing the performance of competing study designs, and provides a simple approach to the computation of design effects which are useful in sample size calculations. Second, the class of hybrid designs emerge as admissible designs, at least in large studies, in the sense that any design not of this form is (weakly) dominated by at least one design that is. Cross-sectional studies can often be conceptualised as a series of observations taken over a large number of weeks or months, in which case the hybrid results may be directly applicable. Third, a simple search algorithm is proposed for finding optimal designs when the 'large-study' results are inapplicable. This is often the case in cohort studies, where the number of observation times is limited to the (usually small) number of repeat measures on each subject.
The precise nature of the optimal design depends on the value of the CMC which, in common with the traditional ICC, may be difficult to predict with certainty. An alternative strategy is to look for a design that reduces the possible loss of precision because of an unknown CMC. For this purpose the nearminimax hybrids of section 3.5 are offered as practical alternatives to more traditional designs.
from which 3 follows. The last equality holds because Z i1 ¼ ffiffiffi ffi T p J i• , and

Fixed cluster effects
If the cluster effects γ i are treated as fixed, the regression model for U i1 in A1 is not identifiable and does not yield an estimate for θ. Its contribution to the precision of the pooled estimate is zero, which is tantamount to setting R = 1 in the expression (A2).

Time-effects modelling
The model in 1 has one parameter for each of the T observation times in the study. However, the argument can be extended to show that the expression (A2) holds under more restrictive time-effects models.
To be precise, if the T × 1 vector of time effects is modelled as t = Xβ for some design matrix X, where β is a vector of unknown parameters, the result (A2) holds provided that the vector J T 1 K (whose jth element is J • j ) lies in the column-space of X. In a stepped design this amounts to the requirement that the model is rich enough to include a separate time parameter for any period in which no cluster experiences an uptake event. For the designs in Figure 1, it means that (A2) remains true provided the model includes at least one time-parameter for each of the columns depicted in the corresponding figure.
Stepped-wedge designs MSW g : Consider a truncated version of the SW g design in which the first column of control observations is absent. If there is just one cluster in each group, the corresponding J-matrix is square. For example, when g = 4, J ¼ the last J i• time-points. It follows that ∑