Fraud or flawed: adverse impact of fabricated or poor quality research


“We must not say every mistake is a foolish one”

Marcus Tullius Cicero[1]

Our lives are littered with mistakes, big and small. Our best hope is that any effect they have on others and us is small and bearable. Academics, often working at some distance from the operating room, just hope that their mistakes are not so great as to alter conclusions or cause embarrassment.

This is not always the case. A Cochrane review of computer or web-based packages intended to change patient behaviour claimed (wrongly) that such interventions were bad for patients [2]. Within hours of this being trumpeted around the media, it was discovered that the review was incorrect because the authors had entered their data the wrong way round; the true conclusion was exactly the opposite of that originally claimed [3]. Despite the extensive peer review and editorial procedures to which Cochrane reviews in particular are subjected, this small but important mistake was missed.

The good news for authors caught up in such embarrassments is that while the original article may receive considerable media attention, retractions are usually ignored [4]. Only three of 50 retractions related to the mistake above even got into the newspapers [5]. As the old Welsh proverb says, ‘Bad news goes about in clogs, good news in stockinged feet.’

But a world away from a genuinely mistaken analysis properly retracted is the situation of wilful fraud, and the belief that outright major fraud can be undertaken with impunity. A meta-analysis of 18 surveys, with almost 12 000 scientists and medics responding, found that 2% of responders reported fabricating, falsifying, or altering their own data, and 15% said they knew of others doing so [6]. This coincides with estimates of 1–13% of scientists knowing of an undisclosed case, though only 1 paper in 5–10 000 is actually documented as fraudulent (quoted by [7]) [8].

Nor is the problem just fabricated data. There is a taxonomy of misconduct, which also includes plagiarism [9], undeclared conflicts of interest [10], issues of authorship and ghostwriting, and covert duplicate publication [11, 12]. Fraud in science and medicine has been going on for a very long time, with some breathtaking examples – so many that you could write a book about it [13].

We have learned to be distrustful, because we know that fraud happens. In earlier years, fewer people had access to so many papers, and it was common to accept someone else’s word; the act of publication itself seemed to confer the certainty of truth. These days many more people have access to many more papers, and they read them more critically, which may increase the chance of detecting fraud. Examples of fraud exposed include claiming to have performed a pioneering operation [14] and making up enough clinical trial data to have at least 15 papers published in prestigious journals like the New England Journal of Medicine and the Lancet ( [15]. We know that some fraud is detected, but we don’t know how much goes undetected.

Anaesthesia is not without its scandals. Careful attention to detail demonstrated that a series of clinical trials on granisetron from Japan had curiously similar adverse event data [16]. The suspect trials provided 64% of the total trial information available, strongly influencing the overall results and dose-response relationships [17]. Systematic review methodology has highlighted other unlikely similarities among suspect trials [18].

Very recently, US anaesthesiologist Dr Scott Reuben admitted to fabricating data in at least 20 clinical studies, published mainly in US anaesthetic journals, which led the editors of those periodicals to issue a series of formal statements and retractions [19]; there may have been more suspect studies. Data from Reuben publications had been included in various systematic reviews and meta-analyses, and an investigation into how the fabricated data compromise these reviews has just been published [20]. Twenty-five systematic reviews cited 27 Reuben reports published between 1994 and 2007. Most examined analgesics in surgical patients. Five reviews did not consider Reuben data for inclusion, and six excluded Reuben reports because of quality or validity criteria.

Fourteen reviews included a Reuben report. In eight qualitative reviews without meta-analysis, all authors agreed that one of the reviews would certainly have reached different conclusions without Reuben reports, in four more there was no unanimity, and in the remaining three all authors agreed that exclusion of a Reuben report would have made no difference. One of six quantitative reviews would have reached different conclusions without Reuben reports, but their exclusion made no difference when the number of patients from Reuben reports was < 30% of the total in the analysis.

This is an important message. Qualitative reviews are more vulnerable than quantitative. Vulnerability increases if the information comes from a small number of investigators.

But we can never know how much is not detected. We know that peer review does not help, whether it is closed or open [13, 21]. Most experienced authors have major doubts about the value and quality of the peer review process. Using a smaller group of experienced reviewers might be more effective, but rejection rates would probably soar and journals slim down. As things stand, the chances that peer review will pick up most forms of malpractice are slight. One might also question whether well-meaning groups like the Committee on Publication Ethics (COPE, founded in 1997 by a small group of journal editors) will make a difference or whether it’s just handwringing. Harsher consequences for ‘guest’ co-authors who do not check data – 15-20 co-authors in the Korean stem cell research case [10], and several co-authors on the Reuben papers – might reveal or prevent fabrication.

We are learning that systematic reviews can help detect plagiarism [9]. Simple meta-analytic tools may also point to places to look for problems. Figure 1 reworks data on granisetron [17] to show how a simple graphical representation concentrating on raw data can highlight where to start looking, compared with statistical outcomes removed from the raw data.

Figure 1.

 Results from individual trials of granisetron with the incidence of postoperative nausea and vomiting (PONV) with granisetron plotted against the incidence of PONV with placebo (%). The line of equality is shown; the diameter of each circle is proportional to the size of each published clinical trial (the inset box indicates the number of patients for the given diameter). (a) data from a single dominating centre; (b) data from other centres. Note that the data from the single centre come from smaller trials showing greater favourable effects for granisetron, while data from other centres are from larger trials showing a less dramatic effect of granisetron. Data from [17].

While carefully performed systematic reviews proved largely robust against the impact of fabricated data, it is clear that problems can occur in two particular circumstances: when fabricated data dominate the evidence base; and in qualitative reviews when issues of size and quality are overlooked and allow bias to creep in [22]. We know that small datasets of limited quality are more likely to mislead than large datasets of high quality. This unremarkable statement needs to be relearned frequently, despite many restatements in different forms [22, 23]. Initial results are often contradicted by later, more careful, research [24]. When looking at data, we should ask whether size and methodological quality and validity are adequate to avoid effects of random chance and bias [22], and also whether data from one individual or research group dominate; if so, is it ‘too good to be true’, with less variability than would be expected by chance?

Systematic reviews are meant to provide the highest level of evidence, and Cochrane reviews in particular are held in high regard. All too often, though, systematic reviews are bedevilled by small numbers of small studies, poorly designed, with dubious outcomes measured at irrelevant times. Multiple statistical testing produces sporadic statistical significance of little clinical importance or relevance. Too much is made of too little, and all too often such results are accepted by people willing to use them to support preconceived notions.

If that sounds harsh, consider a systematic review of systematic reviews of acupuncture [25]. Sixteen of the 33 systematic reviews came up with a positive conclusion (i.e. supporting efficacy for acupuncture), and in six of these (including Cochrane reviews) the conclusion was strongly positive. None of the 16 were found to support the efficacy of acupuncture in a more considered analysis that accepted only evidence with sufficient quality to avoid bias, that eliminated evidence considered invalid (e.g. short-term outcomes in long-term conditions), and took into account the effects of chance when size was small.

Reuben’s fabricated data may have had impact beyond systematic review conclusions because they addressed topical questions for which anaesthetists, surgeons, and patients seek answers, such as the utility of multimodal anaesthesia, or whether non-steroidal anti-inflammatory drugs (NSAIDs) influence bone healing. Wise heads usually wait for confirmation when topics are fashionable. Inadequate and misleading systematic reviews have the potential to do great damage, simply because of the regard in which they are held as ‘evidence’, and because they influence clinical guidelines that use them (sometimes uncritically) in the effort to be evidence-based.

What can we do to improve detection of fraud? The first step is to acknowledge that there is a threat, that some people – for reputational gain or other reasons – are prepared to perpetrate fraud on the community. There is an analogy here to detection of terrorism. The fraudster or the terrorist is always likely to be one step ahead of the authorities. Somebody prepared to publish fraudulent data is not going to be deterred by a box-ticking process along the road to publication. They will just lie.

It seems to us that there are two yellow flags for us all to be aware of, as reviewers, editors or readers. One is when the biology makes little sense. An example would be NSAIDs and bone healing. Millions of people have had NSAIDs after fractures, trauma or orthopaedic surgery without problems of bone healing. The plausibility of a sizeable negative effect of NSAIDs on bone healing has to be questioned [26, 27]. The second yellow flag is the difficult issue of a small number of investigators producing the bulk of the data. Over the years there have been several instances in peri-operative pain clinical research where investigators have published on implausibly large numbers of patients having particular procedures from a single institution in a short period of time [28]. This of course is easier for a reviewer of published papers to spot than it is for the peer review process or the editor handling any of the individual papers. An example would be the use of anti-epileptic medication in postoperative pain [28].

The biggest mistake of all, then, is taking evidence on trust and without checking it. In a world claiming to be evidence-based, the only safe strategy is to know your evidence.