Objective To summarize and evaluate all publications including cluster-randomized trials used for maternal and child health research in developing countries during the last 10 years.
Methods All cluster-randomized trials published between 1998 and 2008 were reviewed, and those that met our criteria for inclusion were evaluated further. The criteria for inclusion were that the trial should have been conducted in maternal and child health care in a developing country and that the conclusions should have been made on an individual level. Methods of accounting for clustering in design and analysis were evaluated in the eligible trials.
Results Thirty-five eligible trials were identified. The majority of them were conducted in Asia, used community as randomization unit, and had less than 10 000 participants. To minimize confounding, 23 of the 35 trials had stratified, blocked, or paired the clusters before they were randomized, while 17 had adjusted for confounding in the analysis. Ten of the 35 trials did not account for clustering in sample size calculations, and seven did not account for the cluster-randomized design in the analysis. The number of cluster-randomized trials increased over time, and the trials generally improved in quality.
Conclusions Shortcomings exist in the sample-size calculations and in the analysis of cluster-randomized trials conducted during maternal and child health research in developing countries. Even though there has been improvement over time, further progress in the way that researchers utilize and analyse cluster-randomized trials in this field is needed.
Evaluation d’essais randomisés en grappes sur la santé maternelle et infantile dans les pays en développement
Objectif: Résumer et analyser toutes les publications incluant des essais randomisés en grappes utilisés dans la recherche sur la santé maternelle et infantile dans les pays en développement au cours des 10 dernières années.
Méthodes: Tous les essais randomisés en grappes publiés entre 1998 et 2008 ont été examinés et ceux qui répondaient à nos critères d’inclusion ont été retenus pour une évaluation plus approfondie. Les critères d’inclusion exigeaient que l’essai ait été conduit sur la santé maternelle et infantile dans un pays en développement et que les conclusions aient été faites à l’échelle de l’individu. Les méthodes tenant compte des grappes dans la conception et dans l’analyse ont étéévaluées dans les essais éligibles.
Résultats: 35 essais éligibles ont été identifiés. La majorité d’entre eux ont été menés en Asie, utilisant la communauté comme unité de randomisation et portaient sur moins de 10000 participants. Afin de minimiser les variables confusionnelles, 23 des 35 essais avaient stratifié, bloqué ou apparié les groupes avant la randomisation, tandis que 17 avaient ajusté l’analyse en fonction des variables confusionnelles. Dix des 35 essais n’ont pas tenu compte du regroupement dans le calcul de la taille de l’échantillon et 7 n’ont pas tenu compte dans l’analyse du concept de randomisation par grappe. Le nombre d’essais randomisés par grappes a augmenté au cours du temps et la qualité des essais s’est améliorée de manière générale.
Conclusions: Des lacunes existent dans les calculs de la taille de l’échantillon et dans l’analyse des essais randomisés en grappes menés dans la recherche sur la santé maternelle et infantile dans les pays du tiers monde. Quand bien même il y a eu des améliorations au cours du temps, des progrès supplémentaires sont nécessaires dans la façon dont les chercheurs utilisent et d’analysent les essais randomisés en grappes dans ce domaine.
Evaluación de ensayos aleatorizados en racimo en investigación materno-infantil en países en vías de desarrollo
Objetivo: Resumir y evaluar todas las publicaciones de ensayos aleatorizados en racimo utilizados para la investigación materno infantil en países en vías de desarrollo durante los últimos 10 años.
Métodos: Se revisaron todos los ensayos aleatorizados en racimo publicados entre 1998 y 2008, y se evaluaron todos aquellos que cumplieron con los criterios de inclusión. Dichos criterios eran que el ensayo fuese sobre salud materno infantil en países en vías de desarrollo y que las conclusiones se hubiesen realizado a nivel individual. Los métodos justificando el racimo en el diseño y en el análisis fueron evaluados en los ensayos que reunían los requisitos.
Resultados: Se identificaron 35 ensayos que cumplían criterios. La mayoría de ellos habían sido realizados en Asia, utilizaban la comunidad como unidad de aleatorización, y tenían menos de 10,000 participantes. Para minimizar factores de confusión, 23 de los 35 ensayos tenían racimos estratificados, bloqueados o pareados antes de ser aleatorizados, mientras que 17 habían ajustado para factores de confusión en el análisis. Diez de los 35 ensayos no tuvieron en cuenta el racimo en los cálculos de tamaño muestral, y 7 no tuvieron en cuenta el diseño de racimo- aleatorizado en el análisis. El número de ensayos aleatorizados en racimo aumentó a lo largo del tiempo, y en general los ensayos mejoraron en calidad.
Conclusiones: Existen deficiencias en los cálculos de tamaño muestral y en el análisis de ensayo aleatorizados en racimo conducidos como parte de la investigación en salud materno infantil en países en vías de desarrollo. Aunque ha habido mejoras a lo largo del tiempo, se requieren mayores progresos en la forma en la que los investigadores utilizan y analizan los ensayos de campo aleatorizados en racimo.
Reducing the worldwide maternal and child mortality ratios from 1990 to 2015 by 75% and 66%, respectively, is a key Millennium Development Goal (United Nations 2008). Given that, in a global perspective, the worst conditions among mothers and their children exist in developing countries, a serious effort should be undertaken in these countries to achieve this goal.
Several interventions have been implemented in maternal and child health care in developing countries throughout the years in order to reduce maternal and child mortality. The majority of them have been assessed by individually randomized controlled trials, but for practical, ethical, or economic reasons, these studies are not always appropriate in developing countries. Using clusters instead of individuals as a randomization unit has, however, proven to be a more efficient and inexpensive alternative, and the method is attractive in settings in which individual randomization is difficult or impossible (Hayes et al. 2000). Particularly, in the field of maternal and child health care cluster randomization has proved practical, as interventions which are known to have an impact on clusters of people rather than only individuals are common. Examples of such interventions are immunization strategies and educational and nutritional interventions spread via health service centres or mass media.
Empirical evaluations have shown that methodological shortcomings are common in the sample-size calculations and in the analysis of cluster-randomized trials in fields other than maternal and child health in developing countries. An evaluation of all cluster-randomized trials conducted in sub-Saharan Africa until 2001 by Isaakidis and Ioannidis (2003) showed that only 10 of 51 (20%) trials had accounted for clustering in sample-size calculations, and that only 37% had taken clustering into account in analysis. Eldridge et al. (2004) found that only 20% of 199 trial reports from cluster-randomized trials in primary care had accounted for the clustering in the design phase and 59% of them had accounted for clustering in the analysis. Furthermore, Donner et al. (1990) found that only three of 16 (19%) studies concerning non-therapeutic interventions from 1979 to 1989 accounted for cluster-randomization in the design phase and eight of 16 (50%) trials took clustering into account in the analysis. In continuation of the shortcomings found in cluster-randomized trials in other fields, we found that it was justified to expect limitations to be present in maternal and child health research in developing countries also. As no evaluation of trials in this field has been done previously, conducting one was found to be relevant.
The aim of this evaluation was to summarize and evaluate the cluster-randomized trials in maternal and child health research that have been conducted in the developing world. The evaluation reports the results of a methodological assessment of all cluster-randomized trials performed in the past 10 years, and it evaluates the extent to which the pre-requisite design and analysis aspects of cluster randomization have been taken into account and reported properly in the trial publications. To evaluate the trials, two checklists based on the Consolidated Standards of Reporting Trials (CONSORT) statement (Moher et al. 2001 & Campbell et al. 2004) were used.
Our aim was to summarize and evaluate all publications of cluster-randomized trials in maternal and child health research that implemented cluster-level randomization, made conclusions on an individual level, and were conducted in developing countries in the last 10 years. In March–April 2007 and March–June 2008 available search engines – including PubMed, SCOPUS, and the Cochrane Library – were reviewed for all relevant papers published in English between January 1998 and June 2008. The keywords that were used in the initial search were: Cluster OR group OR community AND randomized OR randomized AND intervention OR trial.
In the initial search all publications were evaluated to detect whether they met the eligibility criteria. The criteria were that: (i) trials should have a cluster-randomized design; (ii) they should have been published between 1998 and 2008; (ii) they should have been conducted in maternal and child health research in what is designated as the developing regions of the world by the United Nations, and (iii) they should draw conclusions at the individual level. We do realize that trials with conclusions on cluster-level when randomized by cluster are numerous and important in the field of maternal and child health care, but we chose to exclude these studies, as they do not require adjustments for clustering, and thereby do not contain the same prospects of making erroneous conclusions as studies with individual-level analysis do (Chakraborty 2008).
Study reports that reflected secondary publications of a main study report were also included in the evaluation, given that those articles reported different variables as outcomes and thereby used methods and analyses different from those used in the primary study. In addition, whenever secondary publications reported additional useful information about the trial design or analysis of the primary publication, this information was recorded and used to give due credit to the trial. The references of each eligible paper were reviewed in order to find additional eligible trials published during that time period. Papers that presented no description of the methods for design or analysis and did not provide any reference to another publication with exposition of these details were excluded from the scope of this study.
In a secondary evaluation, each eligible article was systematically examined and evaluated by two of the authors. From each publication, information concerning study characteristics, sample-size calculations, analysis and conclusions was extracted. More specifically, for each article, the study recorded: (i) whether the trial was identified as cluster randomized in the title; (ii) whether the rationale for using a cluster-randomized design was stated; (iii) whether a description of what level the interventions pertained to was given; (iv) whether stratification or pairing (an extreme form of stratification in which each stratum consists of two clusters which are randomly assigned to different arms) was used and if so, whether any rationale was stated for doing so; (v) whether a description of how sample size was determined was given; (vi) whether the sample-size calculations took clustering into account; (vii) what method (if any) was used to account for cluster randomization in sample size calculations; (viii) whether magnitude of Intracluster correlation coefficient (ICC), design effect or coefficient of variation was stated; (ix) whether the analysis adjusted for confounding; and (x) whether the analysis took clustering into account. Location, primary object, publication year, sample size, number of clusters and cluster size were also recorded. This checklist was inspired by the CONSORT statement (Moher et al. 2001) and the extended version of CONSORT that has been specially formulated for cluster-randomized trials (Campbell et al. 2004).
In deciding whether clustering had been taken into account in sample-size calculations, theory concerning within-cluster correlation was used. ICC is a measurement that accounts for the degree to which responses from participants within the same cluster are similar. The sample size of a trial depends on the magnitude of the ICC; the larger the ICC, the more participants and clusters are needed. Consequently, to determine the sample size needed in a cluster-randomized trial, an ICC has to be estimated before the data collection begins. In practice, the ICC is either estimated from previous trials, from data collected preliminary to the final data collection, or from simulation (Chakraborty et al. 2009). Based on the ICC a design effect is often calculated, to decide how much a sample size determined to be appropriate for an individually randomized trial should be magnified to agree with a cluster-randomized design (Chakraborty 2008). Another method to determine the sample size in a cluster-randomized trial is the coefficient of variation (Hayes & Bennett 1999).
When analysing cluster-randomized data, conclusions can be made on either cluster or individual levels. When making conclusions on a cluster level, no adjustment for clustering is needed because the unit of randomization is the same as the unit of analysis. On the contrary, when making conclusions on an individual-level analysis, it is necessary to account for within- and between-cluster correlation. There are several methods to make conclusions on an individual level when cluster-level randomization is used; common for the methods is the importance of accounting for clustering. If clustering is not accounted for in analysis, there is an extensive likelihood of false statistical significance (Chakraborty 2008).
Characteristics of studies
The initial search yielded more than 10 000 articles. These were all scanned and evaluated according to the eligible criteria, and 35 papers were found eligible. In the secondary evaluation, the articles were carefully examined, and pertinent information was extracted. An overview of the papers can be seen in Table 1.
Table 1. List of papers published from 1998 to 2008 using cluster-randomized design and drawing conclusions on the individual level in maternal and child health research in developing countries
Location of study
Controlled for confounding
Accounted for cluster randomization in sample size calculations
Adjusted for cluster randomization in analysis
MMN, multiple micronutrient; IFA, iron and folic acid.
Of the 35 papers 11 were published in Lancet and five in BMJ, three in BioMed Central, two in the American Journal of Tropical Medicine and Hygiene, and two each in Tropical Medicine and International Health, the American Journal of Clinical Nutrition and Pediatrics. The other journals in which one paper was published in each were the Journal of Nutrition, the New England Journal of Medicine, the Journal of the American Medical Association, Food and Nutrition Bulletin, the journal of the American Society for Nutritional Sciences, Midwifery, General Obstetrics, and the Transactions of the Royal Society of Tropical Medicine and Hygiene.
Sixty-six per cent of the studies were conducted in Asia (predominantly in Nepal, Bangladesh and India), 20% in Africa and 11% in South America. One of the studies was multisited and conducted in both South America and Asia.
The primary objectives of the trials varied by types of interventions. Ten were nutritional interventions, seven dealt with preventing parasitic diseases such as malaria and helminths, five included training of traditional birth attendants and interventions to improve antenatal health care in general, four dealt with mobilizing or training local communities, four trials had medical trials and immunization as primary objectives, and one trial examined the psychosocial stimulation of children. Two interventions promoted breastfeeding, one hand washing, and one use of primary health care.
The number of clusters in each of the identified trials varied from seven (Jokhio et al. 2005) to 88 940 (Bhandari et al. 2007). The most commonly used unit of randomization in cluster-randomized trials in the examined publications was community (used in 54% of trials), but wards and health zones were also used as units of randomization (in 11% and 26% of trials, respectively). Households were less commonly (9%) used as clusters. The sample size varied widely from 136 (Hyder et al. 2007) to 350 000 (More et al. 2008) participants, although the majority of the studies (54%) had less than 10 000 participants. The average sample size was just below 26 000, and only five of the 35 studies had more than 100 000 participants. The mean cluster sample size varied from just above one (Bhandari et al. 2007) to around 7300 (More et al. 2008). A mean sample size per cluster of more than 200 was less common (26% of trials), while clusters with less than 50 participants were common (43% of trials).
Table 2 outlines how compliant the trials were with selected CONSORT guidelines. Only 51% of the papers identified themselves as cluster-randomized trials in the title, although the majority (61%) mentioned their cluster-randomized study design in the abstract. A few (n = 3) of the publications that were not identified as cluster-randomized trials in the title were instead designated as ‘community-randomized trials’ or ‘community trials’.
Table 2. Compliance with selected CONSORT guidelines
Studies that have included the item
Studies that have not included the item
*Only 23 studies used stratification, blocking or pairing, therefore not all studies are represented in this row.
CONSORT, Consolidated Standards of Reporting Trials; ICC, intracluster correlation coefficient.
Identification of cluster-randomized design in title
Rationale for using a cluster-randomized design
Rationale for stratification, blocking or pairing*
Description of whether the interventions pertained to cluster-level or individual-level
Description of how sample size was determined
Presentation of ICC, magnitude of design effect or coefficient of variation
Description of how clustering was taken into account in the statistical analyses
Six trials stated a rationale for using the cluster-randomized design, the most commonly used rationale was that by intervening at cluster-level cross-contamination between treatment regimens was avoided (Hyder et al. 2007; Jokhio et al. 2005). In one trial (Majoko et al. 2007) the setting did not allow effective individual randomization whereas in another trial (Powell et al. 2004) it was not feasible for the children to receive different treatments within the same clinic. Thus both the intervention and the setting influenced the choice of study design.
In Table 3 some of the main findings of the included trials are listed. To control for confounding in the design phase and increase the power of the trial, all the included trials had selected which clusters should receive interventions by randomization. Furthermore, 16 of the 35 trials had either stratified or blocked the clusters before they were randomized to ensure an equal distribution of baseline characteristics in the intervention and the control group. Among the trials that had used this method to prevent the data from being confounded, the clusters had been stratified according to, among other indicators, geographical distribution, access to health care, weight and age of participants, baseline mortality and morbidity rate, population density, ethnicity, and gender. Pairing was also commonly used to avoid confounding; in addition to the 16 trials that used stratification or blocking, seven of the trials utilized pairing before randomization, and the parameters, which determined the pairing, were similar to the factors that were used in stratifications. Only seven of the 23 trials that used stratification, blocking or pairing had stated the rationale for using these methods. Without exception, the rationale described was to ensure baseline balance.
Table 3. Main findings in the 35 included studies
ICC, intracluster correlation coefficient.
Number of trials that used stratification or blocking in design
Number of trials that used pairing in design
Number of trials that accounted for the cluster-level design in sample-size calculations
Methods used for accounting for the cluster-level design in the sample-size calculations
ICC or design effect
Coefficient of variation
Number of trials that adjusted for counfounding in the analysis
Number of trials that accounted for the cluster-level design in analysis
All trials had described whether the intervention pertained to cluster or individual-level and about one-half (49%) of the trials adjusted for confounding in the analysis by controlling for different baseline variables. Six of the 35 trials had not accounted for confounding in either the design or the analysis phase.
Accounting for clustering
Ten of the 35 trials did not use ICC, design effect or coefficient of variation to adjust for clustering in sample-size calculations. Of the 25 trials that took the cluster randomization into account in calculating the sample size, 72% used ICC or design effect in the calculations and 28% used the coefficient of variation method to account for the cluster-randomized design in the sample size calculations.
Of the 18 trials that present ICC values or design effect, eight have estimated the magnitude from data from previous trials, seven have estimated the value from data collected prior to the final data collection, and three do not state the origin of the value. The magnitude of the coefficients of variation was in two of the seven trials determined on the basis of existing data collected prior to the study, whereas two trials had based the magnitude of the coefficient on estimates available at the national level or for the specific area. The remaining three trials have not stated any origin of the coefficients used.
Seven of the 35 included trials did not account for the cluster-randomized design in the analysis; instead, the data were analysed as if they were randomized at the individual level. Of the 28 studies that did account for the cluster design, only 22 had described how they did it.
The countries with the most problems in accounting for cluster randomization in trials were Bangladesh and Nepal. In both, more than one-half of the trials conducted did not take clustering into account. In Nepal, the problem was most present in the sample size-calculations (five of nine trials did not account for clustering in the sample-size calculations, while one of nine did not account for clustering in the analysis), while the trials in Bangladesh had an equal amount of problems in sample size calculations and analysis (two trials did not account for clustering in sample-size calculations, and two neglected to account for it in the analysis).
The distribution of papers according to journal showed no particular trend towards more appropriate conduction or reporting in journals that more commonly had published cluster-randomized trials.
Change over time
As shown in Figures 1 and 2, the per-annum number of cluster-randomized trials in maternal and child health research conducted in developing countries increased over time. Only 8 (23%) of the 35 trials were published in the first 5 years of the period (from 1998 to 2002), while 27 (77%) were published in the last 5 years (between 2003 and 2008). The number of trials making conclusions in a correct manner increased over time. Five of 8 (63%) of the trials published between 1998 and 2002 did not account for cluster randomization in either the design or analysis phases, while only nine of 27 (32%) of the trials published in the period 2003–2008 did not account for cluster randomization in either design or analysis.
This report is the first to present a coherent evaluation of all cluster-randomized trials with conclusions on the individual level, that were conducted in maternal and child health in developing countries during the period 1998–2008. The evaluation has found that a large proportion of the included trials use improper methods in sample-size calculations and/or analysis. Fourteen of 35 trials (40%) did not account appropriately for clustering in either sample-size calculations or analysis.
In several trials, authors do not make the right type of analysis for the level on which they draw conclusions. For example, Browne et al. (2001) does analyses on the cluster level, even though the conclusions about incidence of Plasmodium falciparum infections, haemoglobin levels and delivery outcomes are made on the individual level. This is also seen in the trial conducted by Schulman et al. (1998), in which the analysis is presented as being at the community level but the conclusions are made at the individual level. This lack of distinction between cluster- and individual-level analyses can potentially lead to false inferences of significant associations between exposure and outcome.
Another recurring problem in the articles is that there is no justification for choice of magnitude of adjustment for clustering in sample-size calculations. For example, Hyder et al. (2007) mentions that a design effect of two was used to account for the clustering effect, but no reasons were presented to support this value. Browne et al. (2001) adjusts the sample size by 15% to allow for clustering without giving any explanation for the choice of ICC values or design effect, and Luby et al. (2004) double the sample size to account for the effect of clustering without presenting any rationale. Furthermore, as many as 17% of the trials that present an ICC do not state the origin of the value. The shortcomings in the documentation of the sample-size calculations make it difficult (and, in some cases, even impossible) to evaluate whether an appropriate sample-size has been used. However, our guess is that if the authors did not justify their choice of magnitude of the adjustment for clustering in the sample size calculation, an underpowered trial was designed as a value too small was most probably used.
Six of the 35 trials did not adjust for confounding in either design or analysis. Whether no confounding factors were present at these study sites is not within the scope of this evaluation, but only one of the six trials mentions that there was a search for confounders and none was identified. The remaining five do not account for any considerations concerning confounding factors. However, this might not be an issue as the design of all the trials included randomization.
The findings of this evaluation show slightly more frequent use of correct methods to account for clustering than those from previously conducted empirical evaluations of cluster-randomized trials in other fields. This evaluation has shown a tendency towards an improvement over time in the percentage of trials that use appropriate designs and analyses when drawing conclusions on an individual level. This improvement can partly explain the better methodological findings among trials included in this evaluation compared with the findings from earlier evaluations.
Despite the demonstrated improvement, this evaluation proves that a need still exists for further progress in the way that researchers use and analyse cluster-randomized trials in maternal and child health research in developing countries. Especially better reporting and sharing of ICC values are needed, as the literature currently contains only few examples of ICC coefficients in the field of maternal and child health in developing countries. Thus, progress in several areas is essential for the research in this field to create valid results and thereby change the problems with which the developing countries are confronted in the field of maternal and child health.