## INTRODUCTION

Many randomised controlled trials in perinatal medicine evaluate treatments that are given to women antenatally or during labour but measure outcomes on their babies. Some treatments given to women are specifically intended to improve outcomes for the baby, but even where the main effect of the treatment is on the woman rather than the baby, data about the baby are often important secondary outcomes.

Antenatal recruitment and neonatal outcome measurement may cause a problem in the analysis when the trial population includes women with multiple pregnancies, because the outcomes of offspring from a multiple pregnancy are not independent. Babies from the same pregnancy are likely to have more similar outcomes than babies from different pregnancies, for several reasons: they will all be exposed to the same conditions before birth, they will be genetically similar or identical, and hence may react in the same way to interventions, and they may affect each other, so that one baby having a particular outcome may make the others in the same pregnancy more likely to have it.

Inclusion of non-independent data means that the “effective sample size” of the trial is reduced: there are fewer independent outcomes in the trial than the number of babies that took part in it. Analysing all babies as if they are independent will therefore overestimate the sample size and give confidence intervals that are too narrow. The extent to which the effective sample size is reduced will depend on the degree of dependence between babies from multiple pregnancies and the proportion of multiples recruited to the trial. If multiple pregnancies make up only a small percentage of the trial, there is probably little potential for them to affect the results. Some trials, although, recruit a substantial proportion of women multiple pregnancies. For example, in the Antenatal TRH trial,^{1} nearly 20% (44/225) of the women recruited had multiple pregnancies, and they contributed 34% (94/275) of the babies in the trial. Even where multiples make up only a relatively small percentage of the trial's population, it is common to analyse single and multiple pregnancies separately in subgroup analyses. In multiple pregnancy subgroups, there is obviously great potential for non-independent data to influence the analysis.

Here we discuss and compare the methods that have been used for analysis of data sets containing multiple pregnancies and suggest applying methods that have been developed for analysis of cluster randomised trials. These adjust the analysis to take account of non-independence. The methods are illustrated using data from the Antenatal TRH trial and two simulated trial data sets.

Probably the most common method of analysis is to ignore non-independence between babies from multiple pregnancies and to assume that each baby is an independent data point. The sample size used in the analysis will therefore be larger than the effective sample size, giving confidence intervals that are too narrow. However, estimates of the risk in each group, and hence the relative risk or odds ratio, are not affected by non-independence. This is because non-independence reduces the effective sample size of both the number of babies with an outcome and the total number by the same amount. For example, if 50/200 babies from 100 twin pregnancies have respiratory distress syndrome, and the effective sample size is half of the total number (i.e. there are 100 independent data points), the estimate of the risk would be 25/100, which is the same proportion (0.25) as 50/200. However, the confidence interval around 25/100 (0.18, 0.34) is wider than that around 50/200 (0.20, 0.31), because of the smaller sample size.

Assuming independence between babies has the advantages of being easy to apply and including all of the trial's data in the analysis.

An alternative method, used by some perinatal trials,^{2} is to use the number of women recruited as the denominator, counting a woman as having an outcome if any of her babies has it (analysis by pregnancy). This is equivalent to taking the worst outcome among any of a woman's babies as the outcome for that woman. This method avoids including non-independent data in the analysis, but it disregards part of the data set; where one of a set of multiples has an outcome, the others do not contribute to the analysis. There is therefore a cost in collecting data that are not used.

Analysis by pregnancy will often give a different risk estimate than assuming independence. For example, if one of every pair of twins died, this method would suggest 100% death, compared with 50% if all babies were included in the analysis. It addresses the mothers' risk of having one or more babies that die, rather than the babies' risk of death.

A further method that has also been used^{3} is to select at random one infant from each multiple pregnancy for inclusion in the analysis (random selection). This avoids including non-independent data, but again it excludes some babies from the analysis, and therefore some data are collected but not used. As for analysis by pregnancy, the sample size for this method will be equal to the number of women recruited, as only one baby from each pregnancy is included.

The random selection element in this method means that the result will not be exactly the same if the analysis is repeated. If there is, in reality, no difference between the groups, it is possible that, by chance, random selection will produce a spurious large difference. Similarly, a real difference may be obscured. These misleading results will be rare, but may not be detected unless the analysis is repeated several times. This then raises the issue of which result should be presented in the trial's publication.

The range of possible results that the random selection can give is constrained by the number of multiple pregnancies that all have the same outcome. For example, if among 20 sets of twins, in 8 sets both babies died, and in the remaining 12 sets, both survived, then random selection of one baby from each pair of twins will always find that 8/20 babies died. However, if the 16 deaths occurred in 16 different twin pregnancies (i.e. in 16 pregnancies, 1 baby died and in 4 neither died), then the random selection could produce anywhere between 0 and 16 deaths from the 20 sets of twins, with the average being 8.

In a trial comparing two groups, there is therefore a range of possible results (odds ratios or relative risks) using this method. There will be an “average” result, when each group has the average number of outcomes, and two extremes, which occur when the maximum possible number of outcomes are selected by chance in one group and the minimum possible in the other, and *vice versa*.

Further analytical methods have been proposed for taking account of non-independence between data points.^{4–6} These have dealt with situations, such as ophthalmology, where each individual can contribute one or two eyes to an analysis, or dental data, where several data points may be provided by each subject. These situations, and trials that include multiple pregnancies, are similar to cluster randomised trials, where groups of participants, or “clusters” (e.g. GP surgeries, villages, towns, schools or patients of a single consultant), are randomly allocated to the interventions being compared.^{7} In perinatal trials, each woman can be regarded as a cluster, with the number of individuals in the cluster being equal to the number of fetuses in her pregnancy. These clusters are much smaller than those in many cluster randomised trials where clusters may include hundreds or even thousands of individuals, but the principle of adjustment of the analyses to take account of similarity between members of the same cluster is exactly the same.

Donner and Klar^{8} present methods for analysis of cluster randomised trials, which can be directly applied to data from trials including multiple pregnancies. The calculations are relatively straightforward and can be carried out on a spreadsheet and are also implemented in the ACLUSTER software program.^{9} The methods include appropriate methods for calculation of the odds ratio and confidence intervals, which are most suitable for analysis of randomised controlled trials (see Appendix 1). They also provide methods for calculating adjusted χ^{2} statistics to take account of clustering, but these are not considered further.

Because the methods described above can give different estimates of the risk in each group, their estimates of the difference between the groups (measured by relative risk, odds ratio or risk difference) may also differ.