Volume 20, Issue 6 p. 908-914
Original Article
Open Access

How evidence-based medicine is failing due to biased trials and selective publication

Susanna Every-Palmer MBChB FRANZCP MSc,

Corresponding Author

Consultant Forensic Psychiatrist

Te Korowai Whariki, Capital and Coast District Health Board, Porirua, New Zealand

Correspondence

Susanna Every-Palmer

Te Korowai Whariki

Capital and Coast District Health Board

PO Box 50-233

Porirua, NZXX0034

New Zealand

E-mail: susanna.every-palmer@ccdhb.org.nz

Search for more papers by this author
Jeremy Howick BA MSc PhD,

Research Fellow

Oxford Centre for Evidence-Based Medicine, Primary Care Health Sciences, University of Oxford, Oxford, UK

Search for more papers by this author
First published: 12 May 2014
Citations: 87

Abstract

Evidence-based medicine (EBM) was announced in the early 1990s as a ‘new paradigm’ for improving patient care. Yet there is currently little evidence that EBM has achieved its aim. Since its introduction, health care costs have increased while there remains a lack of high-quality evidence suggesting EBM has resulted in substantial population-level health gains. In this paper we suggest that EBM's potential for improving patients' health care has been thwarted by bias in the choice of hypotheses tested, manipulation of study design and selective publication. Evidence for these flaws is clearest in industry-funded studies. We argue EBM's indiscriminate acceptance of industry-generated ‘evidence’ is akin to letting politicians count their own votes. Given that most intervention studies are industry funded, this is a serious problem for the overall evidence base. Clinical decisions based on such evidence are likely to be misinformed, with patients given less effective, harmful or more expensive treatments. More investment in independent research is urgently required. Independent bodies, informed democratically, need to set research priorities. We also propose that evidence rating schemes are formally modified so research with conflict of interest bias is explicitly downgraded in value.

The lack of evidence for the benefits of EBM

Evidence-based medicine (EBM) is defined as the conscientious and judicious use of current best evidence in conjunction with clinical expertise and patient values to guide health care decisions 1, 2.

The question of what might constitute ‘best evidence’ is addressed in levels of evidence tables such as the one produced by the Oxford Centre for Evidence-Based Medicine 3. Like most other evidence-ranking schemes, systematic reviews of randomized trials are placed at the apex of the evidence pyramid with mechanistic reasoning and ‘expert opinion’ ranked at the bottom. It seems ironical then that although there are good rationales for why EBM should benefit the population, little ‘high quality’ (according to EBM standards) empirical evidence exists that it does. In this respect little has changed since the ‘Users Guide to Evidence-Based Medicine’ was first published in 1992:

Our advocating EBM in the absence of definitive evidence of its superiority in improving patient outcomes may appear to be an internal contradiction … When definitive evidence is not available, one must fall back on weaker evidence … and on biologic rationale. The rationale in this case is that physicians who are up-to-date as a function of their ability to read the current literature critically, and are able to distinguish strong from weaker evidence are likely to be more judicious in the therapy they recommend … [and] make more accurate diagnoses 4.

The authors went on to suggest that ‘until more definitive evidence is adduced’ the adoption of EBM should ‘appropriately be restricted’ to three groups: those found the rationale compelling, those who wished to test EBM in educational trials and those who found ‘the practice of medicine in the new paradigm is more exciting and fun’ 4. The first group was large, and by the early 2000s the EBM movement was widely described as a health care ‘revolution’ 5, 6, being hailed by Time Magazine in 2001 as one of the most influential contemporary ideas 7. EBM's subsequent rise to ascendency as the prevailing medical paradigm has been called ‘meteoric’ 8.

EBM has been shown to improve practice in specific areas. For example, stroke and myocardial infarction aftercare was improved in light of new evidence 9, 10, and some harmful practices have been reduced when trials revealed the risks outweighed benefits (e.g. postmenopausal hormone replacement therapy) 11, 12. These examples are promising but anecdotal. In another example, a study comparing EBM-trained McMaster graduates to ‘traditionally’ trained peers found the former more knowledgeable about hypertension guidelines at least 5 years after graduation 13. However, this outcome is a dreaded surrogate end point. How can we know that better knowledge of guidelines translates to better patient outcomes? Or that the time needed for learning critical appraisal had not meant that an essential part of the syllabus had been forfeited. Maybe McMaster graduates excelled in treating hypertension, but kept missing paediatric meningitis.

If EBM were the revolutionary movement it was hailed as, we would expect more than benefits demonstrated in specific cases. We would expect population-level health gains, such as those that occurred after the introduction of antibiotics, improved sanitation and smoking cessation 14. Unfortunately, there is little evidence that EBM has had such effects.

No randomized trial accurately addressing the population outcomes of EBM is likely to be forthcoming as the methodological challenges of sample size, contamination, blinding, follow-up and outcome measures are hard to overcome.

The macrolevel evidence about hard health care outcomes we do have suggests that the cost of health care continues to rise 15, improvements are plateauing (e.g. http://www.mortality-trends.org) and trust in medical professionals is decreasing 16. Given that EBM firmly favours an empirical approach over expert opinion and mechanistic rationale, it is ironic that its widespread acceptance has been based on expert opinion and mechanistic reasoning, rather than EBM ‘evidence’ that it actually works.

Have industry-funded randomized trials inhibited the performance of EBM?

There are a number of possible explanations for the absence of data suggesting that EBM has resulted in material gains across the board in health care. It may be that we are nearing the limits of medicine, and gains from anywhere are hard to come by. On this view the low-hanging fruit have already been picked (such as notable triumphs over infectious diseases), and the remaining, chronic, complex illnesses are simply more difficult to address (such as mental illness, diabetes, heart disease, cancer, Alzheimer's). If this is the case, then spending money on any strategy whose effects we cannot measure is difficult to justify.

Another possibility is that there is something inherently flawed in the EBM ‘philosophy’ and that implementing it will not result in health gains. This remains possible, although one of us (JH) has published an extensive defence of the EBM philosophy 17, which addresses this concern.

The non-exclusive hypothesis we will explore in detail is that the lack of evidence that EBM has had an overall benefit is because EBM has not been implemented effectively. Specifically, we will argue that a cornerstone of EBM methodology – the randomized trial – has often been corrupted by vested interests involved in the choice of hypotheses tested in trials and the conduct and selective reporting of such trials. We will support our argument with examples from psychiatry where the problems with corruption of randomized trials are dramatic.

Clinical example: prescribing antipsychotics and antidepressants according to the evidence?

Psychotic disorders such as schizophrenia affect 24 million people worldwide 18 and are highly debilitating conditions. Antipsychotic medications are big business, currently comprising the largest class of pharmaceuticals by sales in the United States. Their international market was estimated at $19.6 billion in 2010 19.

Depression is the third leading cause of disability worldwide 20. The global antidepressant market was valued at $11.9 billion in 2011, with compound annual growth rates of 1.7% predicted to continue into the foreseeable future 21.

These are important conditions and much research (and money) has gone into developing evidence-based practices for managing them. Let us consider the prevailing views for treating psychosis and depression in the 1990s–2000s.

Psychosis

There are two categories of medication used for treating psychosis: the first generation or ‘typical’ antipsychotics developed in the 1960s (e.g. perphenazine, haloperidol, chlorpromazine), and the second generation or ‘atypical’ antipsychotic developed in the 1990s (e.g. olanzapine, quetiapine, risperidone). The typicals are inexpensive. Throughout the 1990s–2000s the atypicals remained on patent and were expensive.

Following the introduction of the atypicals, many high-quality randomized trials and reviews were published establishing they were better tolerated and more effective than their predecessors. Having ‘effective treatments’ changed the demographics of diagnosis. Once atypicals were ‘proven’ effective for bipolar disease, rates of diagnosis rose dramatically, especially in children. The number of children and adolescents treated for bipolar disorder in the United States rose 40-fold between 1994 and 2000 22.

In one author's clinical practice (SE-P) in the mid-2000s, all psychotic patients were prescribed atypicals. Although at that time atypicals cost approximately $4000 more per year per patient than typicals 23, the ‘best evidence’ was that the advantages justified the difference in cost. In favouring atypicals, we had appraised the evidence and were following the guidelines. And we were not alone. More than 90% of antipsychotic prescriptions were written for atypicals 24.

Depression

The selective serotonin reuptake inhibitor (SSRI) class of antidepressants was also developed in the EBM era. Prior to the advent of the SSRIs, tricyclics were the mainstay of pharmacological treatment for depression, but during the 1990s the more expensive SSRIs were aggressively marketed as safer, effective and well-tolerated alternatives. And this appeared to be backed up by the evidence. Few medications have had comparable numbers of double-blinded, placebo-controlled trials performed to demonstrate their efficacy and gain regulatory approval than SSRIs. Over a thousand antidepressant randomized trials have been conducted 25, and statistically significant benefits had been repeatedly demonstrated. Clinicians and patients found this huge body of evidence reassuring, and SSRIs such as Prozac (fluoxetine) quickly became ‘blockbuster’ drugs, supplanting other antidepressants and psychological treatments (such as cognitive behavior therapy) as the most commonly recommended treatment for depression. Their use has become so widespread, 1 in 10 Americans over the age of 12 take antidepressants and they are the medication most frequently used by young and middle-aged adults 26.

A clinical success story and victory for EBM?

The story so far suggests improved patient outcomes and EBM's ability to identify superior treatments to replace less effective alternatives. However, the reality is different. Ten years after atypicals had saturated the market, large independent trials known by the acronyms CATIE (Clinical Antipsychotic Trials of Intervention Effectiveness), CUtLASS (Cost Utility of the Latest Antipsychotic Drugs in Schizophrenia Study), and EUFEST (European First Episode Study) have demonstrated that the atypical agents are in fact no more effective, no better tolerated and are less cost effective than their typical predecessors 23, 27-29.

In relation to depression, independent meta-analyses pooling unpublished as well as published data now show that SSRIs are no more effective than placebo in treating mild-to-moderate depression, the condition for which they have been most commonly prescribed 30, 31.

So how is it that for over a decade we were convinced by the evidence into thinking these treatments were superior? How could there have been ‘an evidence myth constructed from a thousand randomized trials’ 25 and how did we fall for it?

What went wrong with randomized trials?

It seems that something about the way that RCTs are implemented in the real world has undermined their reliability. We explore this issue by asking:
  • Who funds randomized trials and does the funding source matter?
  • How are randomized trials selected and what questions are asked?
  • Which trials are more likely to make it into the literature?
  • How are studies identified when their conclusions are discredited?

Who funds randomized trials and does the funding source matter?

Firstly, it has become apparent that most of the medical evidence base has been funded by industry, although often these financial relationships have not been disclosed. Between two-thirds and three-quarters of all randomized trials in major journals have been shown to be industry funded 32, 33.

Secondly, there is strong evidence that industry-funded studies produce results that differ from independently funded studies. Compared with independent trials, industry-sponsored trials exaggerate treatment effects in favour of the products preferred by their sponsor 34-37.

Although industry influence has been pervasive across medicine, psychiatry has been at the epicentre of much of the controversy about funding source bias and conflict of interest (e.g. 38-40). Among randomized, double-blind, placebo-controlled studies in psychiatric journals, those that reported conflict of interest were five times more likely to report positive results 33.

Heres et al. reviewed industry-funded randomized trials comparing atypical antipsychotics to determine if a relationship existed between the sponsor and the study outcome 41. It did. Ninety per cent of trials showed superiority of the sponsor's drug. The resultant circularity was illustrated in the study's title ‘Why olanzapine beats risperidone, risperidone beats quetiapine, and quetiapine beats olanzapine’; in pairwise comparisons of different trials examining the same two drugs, the sponsors' drug almost always triumphed. This implausible result was not due to publication bias. The studies had just been designed in a way that would virtually guarantee the favoured drug would ‘win’ – for example, the comparator drug was dosed too low to be effective, or so high that it would produce intolerable side effects. Exclusion or inclusion of specific patients, placebo lead-in periods, short follow-up, selection of imputation techniques, the use (or not) of adjustments and selective outcome reporting also allows for magnification of effect-size estimation. Some of these methods are summarized by Smith 42 in Box 1.

Box 1. Examples of methods for pharmaceutical companies to get the results they want from clinical trials from Smith 2005 42

  • Conduct a trial of your drug against a treatment known to be inferior.
  • Trial your drugs against too low a dose of a competitor drug.
  • Conduct a trial of your drug against too high a dose of a competitor drug (making your drug seem less toxic).
  • Conduct trials that are too small to show differences from competitor drugs.
  • Use multiple end points in the trial and select for publication those that give favourable results.
  • Do multicentre trials and select for publication results from centres that are favourable.
  • Conduct subgroup analyses and select for publication those that are favourable.
  • Present results that are most likely to impress – for example, reduction in relative rather than absolute risk.

Further evidence of biased data comes from court cases against pharmaceutical companies. In 2012 GlaxoSmithKline (GSK) was fined a record $3 billion for multiple criminal and civil offences including the unlawful promotion of medicines, failure to report safety data and false reporting 43. This included creating ‘misleading journal articles’, falsely claiming efficacy for paroxetine (an SSRI) in under 18 year olds and non-disclosure of negative trials. In fact there was no evidence of paroxetine efficacy for teenagers and a small but real increased suicide risk 44. GSK was also charged with creating ‘sham advisory boards and supposedly independent CME programs’ 45.

GSK is not an outlier. Johnson & Johnson, manufacturer of atypical antipsychotics risperidone and paliperidone, has recently pleaded guilty to criminal misdemeanour in their marketing of risperidone 46. The company was fined $2.2 billion in criminal and civil fines in 2013 and $1.2 billion in 2012 for deceptive practices including hiding risks and exaggerating benefits 45, 47. In 2009 Eli Lilly, settled litigation for $1.4 billion for illegally marketing and allegedly concealing the risks of olanzapine. In 2010 AstraZeneca paid $520 million to settle allegations it had marketed quetiapine (also an atypical) illegally and hidden adverse effects. Internal documents published as part of that court action provide considerable insight. Emails show a senior company official discussing strategies that might put ‘a positive spin on this cursed study’. ‘Lisa [company physician] has down a great smoke-and-mirrors job’, he says approvingly. ‘Thus far, we have buried trials 15, 31, 56’, writes a publications manager. ‘The larger issue is how do we face the outside world when they begin to criticize us for suppressing data?’ 48.

How are randomized trials selected and what questions are asked?

As well as biased results from randomized trials, the selection of treatments tested and questions asked is driven by what is likely to be profitable as well as what is likely to benefit patients. Unfortunately, patient benefits and profit can pull in opposite directions.

Although the number of randomized trials is rising exponentially, the amount of evidence available on an intervention/practice tends to correlate to the commercial as opposed to clinical importance of that intervention. That is, randomized trials are expensive and so naturally industry-funded studies focus on potentially lucrative treatments 49 such as new drugs, drugs remaining on patent, expensive drugs or drugs considered to have wide commercial appeal. It is telling that in comparison with (patentable) drug therapy, there have been very few clinical trials of exercise for treating depression, an intervention suggested by a recent Cochrane review to be of equivalent efficacy to conventional drug treatment 50. Whether exercise is useful for treating depression is highly clinically relevant yet has little commercial value because exercise cannot be patented.

Conversely, many clinically unimportant questions (or questions already adequately addressed) are extensively researched while important questions are neglected. For example, thousands of randomized trials compare the effectiveness of similar antipsychotics and none investigate the effective treatment of antipsychotic-induced constipation, a distressing adverse effect occurring in up to 60% of antipsychotic-treated patients, which can progress to fatal bowel obstruction 51.

Which trials are more likely to make it into the literature?

Pharmaceutical companies have a natural incentive to promote results that are favourable to their products and to minimize results that are unfavourable. In the 1990s, a Wyeth employee overwrote computer files obliterating the evidence that their diet drug fen-phen caused valvular heart disease 52. A more cautious (and popular) approach is simply not publishing.

The selective publication of positive results and non-publication of negative results is known as publication bias. The current best estimate is that half of all completed clinical trials have never been published in academic journals and some have never been registered 53. Publication bias occurs for industry and non-industry trials, and for trials of all size 54. This significantly distorts the evidence base.

Turner et al. examined the question ‘how accurately does the published literature convey data on drug efficacy to the medical community?’ looking specifically at antidepressants 55. They examined all completed antidepressant trials registered with the Food Drug Administration (FDA) (n = 74), resorting to the Freedom of Information Act to obtain the complete data, as one-third of studies remained unpublished. In 38 trials the studied antidepressant was more effective than the comparator (placebo or another active treatment). We will call these 38 trials ‘positive’. In another 36 ‘negative’ trials the studied antidepressant was not more effective. The researchers then examined the publication fate of these trials. Whether the results were published or not was strongly associated with the study outcome. Thirty-seven of 38 trials with positive results were published. However, of the negative studies only three were published accurately. Twenty-two were not published at all, and 11 were published in a way that falsely conveyed a positive outcome. This is depicted in Fig. 1.

figure

The publication of antidepressant trials (from Turner et al. 55).

In summary, in the literature available to the prescriber, 94% of antidepressant trials appeared positive. However, in reality only 51% of the completed trials in the FDA database were positive, resulting in a 32% overestimation of effect size 54.

This is the only study that we report in depth, so a valid rebuttal would be that we have ‘cherry picked’ an exceptional example that best supports our thesis, being guilty of exactly the same bias we are attacking. However, this is not the case. There is a large body of evidence including several systematic reviews that all show strong evidence of publication bias 36, 52, 56, 57. Three systematic reviews looking at all the studies ever published – not a small undertaking – found that overall industry-funded studies were two to four times more likely to report favourable results 36, 52, 55.

Although the EBM approach decries publication bias and strives to identify it, we cannot do this very accurately. Numerous methods are available, but none can identify or rule out selection bias entirely 54, 58-61. These methods (such as funnel plots) are deployed at systematic review level. They plot results from all available studies and use statistical modelling to detect gaps. These are useful techniques, but resemble partially equipped ambulances waiting at the bottom of the cliff.

How are biased randomized trials identified when their conclusions are discredited?

Bad evidence lingers. Biased randomized trials are not clearly labelled as such once discredited. Once papers enter the electronic literature there they remain. There is no vigilant cyber librarian who stamps ‘retracted’ across them if they are subsequently refuted, and they may continue to mislead. If a busy health practitioner does a quick keyword search, the discredited randomized trial may be the first to appear, with no identification of its flaws. Even informed critics may be misled. For example, Tatsioni et al. found 50% of academic reviews promoted a discredited intervention (vitamin E for heart disease) 5 years after it had been convincingly proven ineffective 62.

Discussion

Counting your own votes

The evidence we have presented dictates that trials should be conducted by independent bodies.

Common sense suggests the same thing. Imagine the government proposed disbanding the electoral commission in favour of letting politicians count their own votes. This would not be accepted for various reasons. Politicians are not objective. They have invested time and money campaigning. They believe in their party. They want to win. The less honest politician might fabricate results. The more honest might approach the task with sincerity, but be influenced subconsciously into appraising incomplete ballot forms as valid based on their endorsements. For these valid reasons, the results would not be accepted so that would be a waste of time and money. If this is the case why is accepting the ‘vote counting’ of industry in demonstrating the efficacy of their own products any less flawed? And we do not need a thought experiment to explore whether the results are biased. Real experiments have repeatedly shown this.

Self-funding as a false economy

One might object by noting that randomized trials are very expensive. It was perhaps thought that industry-funded randomized trials represented a happy coincidence between commercial self-interest and the public good. However, this has been a false economy. Not only have the research costs incurred by industry been recouped from the public 63, but the resulting evidence base is neither robust nor reliable. Moreover, as the patients end up paying for the treatments (either via taxation, insurance policies or out of pocket), the least-biased method for evaluating treatments would seem to be in their interest. Certainly, what we have written here suggests that patients would be saving money in the long run if trials were independent.

Industry is not the only bias

One might argue that it is unfair to condemn industry for being biased when all humans have biases. It would be naive to think that publicly funded trials were free from bias. This argument is entirely valid, and when evidence about other biases is presented we should react. This is not an excuse for failure to respond to the large industry-funded biases of which we are currently aware.

Would better evidence result in better evidence for effectiveness of EBM?

Our argument in relation to the performance of EBM is not an ‘in principle’ argument, but a contingent one. It is our belief that if we had unbiased randomized trials we would have a better evidence base that, if implemented, ought to lead to tangible and measurable health benefits. Although this is a matter of speculation (it is possible that even with entirely unbiased evidence, EBM would not result in demonstrable population benefits), it is a hypothesis worth exploring. We set out below potential solutions to the biased nature of industry-funded randomized trials.

Towards a solution

There are plenty of polemics on the evils of Big Pharma, and calls for greater industry accountability and regulation. We endorse these approaches, but consider them inadequate. It is naïve to think that we can prevent vested interests from introducing bias. Politicians cannot tally their votes and in sport we rely on umpires, not players, to call the penalties. What are we thinking relying on industry to provide evidence about health interventions that they have developed, believe in and stand to profit from? We need to recognize this inherent bias and take action against it.

It is beyond the scope of this paper to discuss practical solutions in great detail, however, we make the following suggestions:
  1. The sensible campaign to formalize and enforce measures ensuring the registration and reporting of all clinical trials (see http://www.alltrials.net/) should be supported – otherwise trials that do not give the answer industry wants will remain unpublished.
  2. More investment in independent research is required. As we have described, it is a false economy to indirectly finance industry-funded research through the high costs of patented pharmaceuticals.
  3. Independent bodies, informed democratically, need to set research priorities.
  4. Individuals and institutions conducting independent studies should be rewarded by the methodological quality of their studies and not by whether they manage to get a positive result (a ‘negative’ study is as valuable as a ‘positive’ one from a scientific point of view).
  5. Risk of bias assessment instruments susch as the Cochrane risk of bias tool 64 should be amended to include funding source as an independent item.
  6. Evidence-ranking schemes need to be modified to take the evidence about industry bias into account. There are already mechanisms within EBM evidence-ranking schemes to up- or downgrade evidence based on risk of bias. For example, the Grading of Recommendation Assessment, Development and Evaluation (GRADE) system allows for upgrading observational evidence demonstrating large effects, and downgrading randomized trials for failing to adequately conceal allocation (and various other factors) 65. However, currently such schemes are agnostic to the origins of evidence and do not expressly recognize the high risk of bias when the producers of evidence have an invested interest in the results. It would be easy to introduce an evidence quality item based on whether a trial was conducted or funded by a body with a conflict of interest. If so, the evidence could be downgraded. Given the failure of current evidence-ranking schemes to detect and rule out industry-funding bias, this is a necessary step if EBM critical appraisal is to remain credible.

The first four proposals are not novel. However, formally modifying evidence-ranking schemes to explicitly downgrade research with conflict of interest bias does not appear to have been previously considered by the EBM movement. This would be a straightforward and easily implemented step that may facilitate a move from the glorification of ‘EBM’ per se, to a clear acknowledgement that all evidence is not created alike, and industry-funded bias is pervasive and misleading.

Conclusion

We have demonstrated that unfavourable trials are frequently left unpublished and so are unavailable to doctors and patients. Through processes of selective publication and manipulation of study design, industry-sponsored studies are weighted to be favourable to their product. Simply put, industry-sponsored evidence is incomplete and biased. Most intervention studies are industry sponsored. This means that the overall evidence about many interventions is incomplete and biased. As a result patients may be given less effective, harmful or more expensive treatments. We have proposed some possible remedies, including that the EBM movement explicitly downgrade any research produced by those with a vested interest in the results.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.