joint first authors.
Modelling the hierarchical structure in datasets with very small clusters: a simulation study to explore the effect of the proportion of clusters when the outcome is continuous
Article first published online: 1 OCT 2012
Copyright © 2012 John Wiley & Sons, Ltd.
Statistics in Medicine
Volume 32, Issue 8, pages 1429–1438, 15 April 2013
How to Cite
Sauzet, O., Wright, K.C., Marston, L., Brocklehurst, P. and Peacock, J.L. (2013), Modelling the hierarchical structure in datasets with very small clusters: a simulation study to explore the effect of the proportion of clusters when the outcome is continuous. Statist. Med., 32: 1429–1438. doi: 10.1002/sim.5638
- Issue published online: 14 MAR 2013
- Article first published online: 1 OCT 2012
- Manuscript Accepted: 6 SEP 2012
- Manuscript Received: 9 MAR 2011
- non-independent data;
- small clusters;
- mixed model;
- linear regression;
In cluster-randomised trials, the problem of non-independence within clusters is well known, and appropriate statistical analysis documented. Clusters typically seen in cluster trials are large in size and few in number, whereas datasets of preterm infants incorporate clusters of size two (twins), size three (triplets) and so on, with the majority of infants being in ‘clusters’ of size one. In such situations, it is unclear whether adjustment for clustering is needed or even possible. In this paper, we compared analyses allowing for clustering (linear mixed model) with analyses ignoring clustering (linear regression). Through simulations based on two real datasets, we explored estimation bias in predictors of a continuous outcome in different size datasets typical of preterm samples, with varying percentages of twins. Overall, the biases for estimated coefficients were similar for linear regression and mixed models, but the standard errors were consistently much less well estimated when using a linear model. Non-convergence was rare but was observed in approximately 5% of mixed models for samples below 200 and percentage of twins 2% or less. We conclude that in datasets with small clusters, mixed models should be the method of choice irrespective of the percentage of twins. If the mixed model does not converge, a linear regression can be fitted, but standard error will be underestimated, and so type I error may be inflated. Copyright © 2012 John Wiley & Sons, Ltd.