Evidence for non-random sampling in randomised, controlled trials by Yuhji Saitoh



A large number of randomised trials authored by Yoshitaka Fujii have been retracted, in part as a consequence of a previous analysis finding a very low probability of random sampling. Dr Yuhji Saitoh co-authored 34 of those trials and he was corresponding author for eight of them. We found a number of additional randomised, controlled trials that included baseline data, with Saitoh as corresponding author, that Fujii did not co-author. We used Monte Carlo simulations to analyse the baseline data from 32 relevant trials in total as well as an outcome (muscle twitch recovery ratios) reported in several. We also compared a series of muscle twitch recovery graphs appearing in a number of Saitoh's publications. The baseline data in 14/32 randomised, controlled trials had p < 0.01, of which seven p values were < 0.001. Eight trials reported four ratios of the time for the return of muscle activity after neuromuscular blockade, the distributions of which were homogeneous: the p values for the observed Q statistics were 0.0055, 0.031, 0.016 and 0.0071. Comparison of graphs revealed multiple coincident or near-coincident curves across a large number of publications, a finding also inconsistent with random sampling. Combining the continuous and categorical probabilities of the 32 included trials, we found a very low likelihood of random sampling: p = 1.27 × 10−8 (1 in 100,000,000). The high probability of non-random sampling and the repetition of lines in multiple graphs suggest that further scrutiny of Saitoh's work is warranted.


In 2006, an analysis of homogeneity in meta-analyses identified a very extreme degree of between-study homogeneity in five studies published by Joachim Boldt [1]. Suspicions raised by readers of a 2009 publication subsequently led to an institutional investigation and ultimately the retraction of more than 90 of Boldt's published studies for lack of ethics approval and fabrication of data [2].

In 2012, a similar analysis of the baseline variables in a large number of studies published by Yoshitaka Fujii found a very low probability of random sampling [3]. This evidence formed an important part of the request that prompted a multi-institutional investigation of Fujii's publications, ultimately leading to the recommendation that over 180 papers should be retracted, again for lack of ethics approval and fabrication [4]. The methods used for the analysis have subsequently been refined [5]. One of Fujii's co-authors on 34 of those retracted papers was Dr Yuhji Saitoh, who was first and corresponding author on eight of these trials.

Following concerns raised over a new submission to the journal Anaesthesia and Intensive Care, we undertook a more focused analysis of data in randomised, controlled trials with Dr Yuhji Saitoh as an author.


In 2013, randomised, controlled trials published in six anaesthesia journals (2002–2012) were surveyed (unpublished). The distributions of mean (SD) for baseline variables were analysed using a published method [3]. Additional studies for authors of at least two trials for which p < 0.05 were retrieved. The analyses were repeated using Monte Carlo simulations, which is a more reliable method than that used for Fujii (as described in the June 2015 issue of Anaesthesia). Monte Carlo simulations were also used for baseline categorical variables. The method used to analyse baseline continuous variables has been described in detail [3, 5]. In summary, Monte Carlo simulations were used instead of an independent t-test or ANOVA to generate a p value for differences between means. The aspect of interest is the probability that the difference in means would be less than reported (the left-hand tail of the distribution), which is equal to (1 − p)/2 where ‘p’ is the p value generated by a two-sided t-test or ANOVA. However, parametric tests of summary data generate p = 1 when the means are the same, as if they were identical to an infinite number of decimal places. Monte Carlo simulations are needed when the precision of means is insufficient to discriminate their differences. Monte Carlo simulations were also used for categorical variables and Stouffer's method to combine p values for continuous variables, categorical variables and all baseline variables. The Kolmogorov–Smirnov test was used, against a uniform distribution, for the p values of variables and randomised, controlled trials. The homogeneity of the standardised mean differences in the twitch recovery times (in a train-of-four) in the relevant studies was also analysed using Monte Carlo simulations for the Q statistic, as well as for the tau statistic and effect size probability. This type of analysis was used to identify the unusual homogeneity in the results of Boldt et al. in 2006 [1]. The code used to program the Monte Carlo simulations is available as an online appendix (Appendix S1).

By December 2013, 11 trials with baseline data published by Saitoh and co-authors had been so detected and analysed: 6/11 had unlikely distributions of baseline data, and it was noticed additionally that at least two others shared graph lines in common, which seemed unlikely, so a wider comparison of graphs in Saitoh's publications was also undertaken. Graphs were copied and transparently pasted on top of other graphs.

An additional six trials with baseline data that Saitoh had co-authored with Fujii had previously been analysed, [3] but the association was not recognised at the time. After the submission to Anaesthesia and Intensive Care raised concerns, the Y Saitohs were identified as the same individual. A number of additional published trials and one unpublished trial (the paper submitted to Anaesthesia and Intensive Care) could then be added to the analysis. All analyses were conducted in R [6].


In addition to the unpublished trial submitted to Anaesthesia and Intensive Care, we retrieved 40 studies with Yuhji Saitoh as an author and for which Yoshitaka Fujii was not corresponding author (Appendix S2 [1–40]); in all we analysed baseline continuous data in 32 randomised, controlled trials (Appendix S2 [1–32]). Dr Saitoh was corresponding author for 26 of these trials (Appendix S2 [1–17, 19–21, 23–25, 30–32]), six of which have been retracted (Appendix S2 [12–17]) and one rejected before publication (Appendix S2 [32]). A further two randomised, controlled trials with Dr Saitoh as corresponding author that did not present baseline data have been retracted (Appendix S2 [33, 34]).

The baseline variables of 14/32 trials had combined p < 0.01 (one right-hand p value), of which seven were < 0.001 (Table 1 and Fig. 1). These p values are for the distribution of baseline means and rates and are less extreme than those calculated for 158 randomised, controlled trials with Yoshitaka Fujii as author (Fig. 2). The probability for distributions of standard deviations and their associated means can also be calculated. For example, both means and standard deviations are proximate in Fig. 3 of reference (Appendix S2 [24]), reproduced (with permission) in Fig. 3. The probability that a similar table would contain mean (SD) combinations as or more similar than reported was 0.0000089, determined in 100 million Monte Carlo simulations.

Table 1. The probabilities that simple random sampling would result in groups as similar as reported for: means (continuous variables); rates (categorical variables); continuous and categorical probabilities combined. Reference numbers are as listed in online Appendix S2
Reference Appendix S2Year Journal Volume1st pageCorresponding authorp value for baseline variables
  1. AA, Anesthesia and Analgesia; AAS, Acta Anaesthesiologica Scandinavica; AIC, Anaesthesia and Intensive Care; An, Anaesthesia; BJA, British Journal of Anaesthesia; CJA, Canadian Journal of Anesthesia; EJA, European Journal of Anaesthesiology; FJMS, Fukushima Journal of Medical Sciences; JA, Journal of Anesthesia; JCA, Journal of Clinical Anesthesia.

  2. a

    Investigated, not retracted.

  3. b

    Investigated, retracted.

11993 BJA 70402Saitoh0.0460.0130.0037
21995 BJA 74293Saitoh0.180.0720.055
31995 CJA 42992Saitoh0.400.000370.026
41995 CJA 421096aSaitoh0.00960.00210.00027
51996 CJA 43362Saitoh0.150.00210.0086
61997 AA 841354Saitoh0.450.00550.080
71997 AAS 41741Saitoh0.140.0130.019
81997 CJA 44390Saitoh0.160.00110.0089
91997 EJA 14327Saitoh0.980.000820.59
101998 AAS 42851Saitoh0.220.00450.051
111998 An 53244aSaitoh0.00720.00230.0015
121998 EJA 15524Saitoh0.0130.00000150.0000097
131998 EJA 15649Saitoh0.680.0840.39
141999 BJA 82329bSaitoh0.0150.0120.0011
151999 BJA 83275bSaitoh0.130.00600.0093
161999 AA 891565bSaitoh0.110.00220.0068
172001 CJA 4828bSaitoh0.0280.00210.00090
182001 AA 931214Oshima0.570.0490.34
192001 BJA 86814Saitoh0.700.00220.046
202002 JA 16102Saitoh0.610.200.47
212002 An 57218Saitoh0.780.00450.34
222003 An 58643Nakajima0.68
232003 BJA 90480Saitoh0.0190.0730.00028
242003 CJA 50342Saitoh0.00270.0210.00012
252005 CJA 52467Saitoh0.0.0950.0670.29
262005 JCA 17276Hattori0.940.390.89
272005 EJA 2220Hattori0.160.0730.048
282007 FJMS 5361Katayama0.81
292010 JA 24168Oshima0.99999860.220.999961
302010 JCA 22318Saitoh0.95
312012 JA 2628Saitoh0.570.660.64
322015 AIC UnpublishedSaitoh0.00180.0600.0010
Figure 1.

The cumulative distribution of p values for the means of 116 continuous variables from 32 randomised, controlled trials with Yuhji Saitoh as first and corresponding author. The distribution of p values was inconsistent with simple random sampling, p = 0.0011.

Figure 2.

The cumulative distribution of p values for the combined means of 32 randomised, controlled trials with Yuhji Saitoh as first and corresponding author (black markers). The distribution of p values was inconsistent with simple random sampling, p = 0.0023. The distribution of equivalent p values from 150 randomised, controlled trials with Yoshitaka Fujii as corresponding author (red markers), which were less consistent with simple random sampling, p < 2 × 10−16.

Figure 3.

The distribution of means were analysed for all papers by Dr Yuhji Saitoh. This particular table was also analysed to determine the probability that random sampling would result in the distribution of standard deviations in association with their means, p = 0.0000089 [Appendix S2, 24]. Reproduced with permission.

Figure 4.

In 2006 results in Boldt et al.'s papers were shown to lack the variability expected due to chance [1]. This figure illustrates the same technique, applied to ratios of time taken for twitch numbers (1, 2, 3 or 4) to recover in eight randomised, controlled trials with Saitoh as corresponding author. The probabilities for the lack of heterogeneity were 0.0055 (T1), 0.031 (T2), 0.016 (T3) and 0.0071 (T4).

Eight papers (Appendix S2 [19, 21–23, 25, 27, 28, 30]) reported mean (SD) times for train-of-four twitches at four time points (T1, T2, T3, T4) in two (or three) groups. The ratio of means for two groups at times T1:T4 varied little, across several RCTs, ranging from 0.75 to 0.77 for all four time points (Fig. 4). The Monte Carlo p values for the homogeneity (Q statistic) of these results were 0.0055 (T1), 0.031 (T2), 0.016 (T3) and 0.0071 (T4). These and other ratios of muscular function and post-tetanic count after neuromuscular blockade were presented graphically in 14 papers (Appendix S2 [5, 11, 16, 19, 23–25, 27, 30–32, 36, 37, 40]). The lines of some of these graphs were coincident, or nearly so, and are presented in Fig. 5 (all graphs reproduced with permission).

Figure 5.

Reproduced graphs of mean (SD) values plotted as lines in multiple graphs. The numbered references are listed in Appendix S2. The combined graphs (right column) are size-adjusted overlays of the two graphs to the left, reproduced from different publications. In each case at least one coincident or near-coincident curve can be identified, not consistent with random sampling. All graphs reproduced with permission.


We have found improbable distributions of baseline data (1 in 100,000,000 combined) and improbable homogeneity of results across a substantial number of studies published by Dr Yuhji Saitoh, mirroring similar findings in the previous analysis of the work of Yoshitaka Fujii.

Saitoh has co-authored 36 papers with Fujii, 11 with Saitoh as corresponding author of which eight have already been retracted. The investigation into Fujii concluded that three trials authored by Saitoh were probably conducted and reported honestly (Appendix S2 [4, 10, 11]). Analyses of baseline data indicate that it is unlikely that two of these (Appendix S2 [4, 11]) reported the results of simple random allocation of participants into groups.

The possibility of a more widespread problem within a research network suggests that such institutional investigations should not be restricted to single authors. In the case of Boldt, for example, his co-authors published a paper without him [7] and this paper was also subsequently retracted.

The findings of this analysis support further institutional investigations into research published by Dr Yuhji Saitoh. Until such a time that these results can be explained, as was also recommended in the case of Fujii [3], we think it is important that Dr Saitoh's data are excluded from meta-analyses or other reviews of the relevant subjects.


The authors would like to acknowledge the help of Dr Neville Gibbs and Dr Steve Yentis in the preparation of this paper.

Competing interests

No external funding or competing interests declared. JC is an editor of Anaesthesia and this manuscript has undergone additional external review as a result.