Patterns are useful because they tell us something about the processes that create them. When patterns deviate from the expected, we know that something unusual has happened. Random allocation of individuals from a population into different groups distributes both categorical and continuous variables in predictable patterns, the centre, spread and shape of which are necessary consequences of the interaction between the sampled population and the sampling method.

For example, the variation in the means of continuous variables, such as age, depends upon: (i) the mean age of the sampled population; (ii) the population’s age distribution; and (iii) the size of the sample. The distribution of mean values for each continuous variable is normal (Gaussian), unless the population variable is both very asymmetric (‘skewed’) and the samples have been small (often quoted as < 30 individuals). The distribution of means in such cases will be slightly skewed and may cluster more or less tightly around the population mean.

Similarly, the variation in the proportions of binomial characteristics, such as sex, depends upon: (i) the proportions of each sex in the sampled population; and (ii) the size of the sample. The shape and asymmetry of binomial distributions change with these two variables.

Significant deviation from the expected occurrences of one binomial characteristic in the outcomes reported by one particular anaesthetic researcher was publicised by Kranke et al., commenting that: ‘Reported data on granisetron and postoperative nausea and vomiting by Fujii et al. are incredibly *nice*!’ [1]. Kranke et al. concluded by observing: ‘*…*we have to conclude that there must be an underlying influence causing such incredibly nice data reported by Fujii et al.’

Kranke et al. had looked at 47 randomised controlled trials (RCTs) of antiemetics to prevent postoperative nausea and vomiting (PONV), published between 1994 and 1999 by Dr Yoshitaka Fujii and colleagues (references 1–46; Appendix S1; available online, please see details at the end of the paper). Eighteen of these RCTs had reported postoperative rates of headache. Ten had reported the same rate of headache in every group; for instance, in one paper, Fujii et al. reported that they had randomly allocated 270 women to one of six groups (reference 1; Appendix S1). Eighteen of the 270 women had postoperative headaches: 3/45 in each of the six groups. Table 1 shows the reported and expected (i.e. by chance) rates of headache in such patients.

Women with a headache in a group of 45 | Groups reported with this incidence of headache in this study | Groups expected with this incidence if headaches were distributed randomly across groups |
---|---|---|

0 | 0 | 0.3 |

1 | 0 | 0.9 |

2 | 0 | 1.4 |

3 | 6 | 1.4 |

4 | 0 | 1.0 |

5 | 0 | 0.6 |

6 | 0 | 0.3 |

7 | 0 | 0.1 |

Kranke et al. proceeded to reject the null hypothesis that 10/18 RCTs would report homogenous rates of headache by chance, calculating a probability of 6.8 × 10^{9}, or 1 in 147 million. My slight concern is that this indirect calculation confused the probability of a particular incidence's occurring, with the probability that this incidence is consistent with the expected binomial distribution. Kranke et al. calculated the first probability, but it is the second that I am more interested in. This turns out to be ∼1 in 5600 for the distribution of headache reported in all 18 RCTs that Kranke et al. analysed: more than Kranke’s estimate, but still 280 times smaller than the p < 0.05 threshold conventionally regarded as statistically significant.

Kranke et al. had concluded that it was more likely that an ‘unnatural mechanism’ had obliterated the expected binomial distribution. Moore et al. mention these ‘suspect’ data in their editorial on scientific fraud [2].

My purpose in this study was to extend the statistical analysis of papers, begun by Kranke et al., to all RCTs published by Fujii. Identification of unnatural patterns of categorical and continuous variables would support the conclusion that these data depart from those that would be expected from random sampling to a sufficient degree that they should not contribute to the evidence base.