The question of “when to randomize?” is arguably too complicated to answer, if it can even be answered definitively, in the text of a several-thousand-word article. From a broad perspective, randomized studies are relevant to activities and decision making in scientific as well as policy settings, while also encompassing a wide range of intersecting and overlapping topics. Representative situations include how an individual investigator decides that a randomized trial is warranted to study a particular research question; when the Patient-Centered Outcomes Research Institute should commission randomized or non-randomized research; and whether the Food and Drug Administration always needs evidence from randomized trials to approve new drugs.
After a brief discussion of scientific terms, evidence-based medicine (EBM), and costs and ethical considerations, the remainder of this article will present an overarching framework to help clarify what randomization accomplishes and what it does not—from a methodological perspective. Specifically, an argument is presented, with representative examples, focusing on comparative effectiveness research (CER) and promoting what can be called medicine-based evidence (MBE). This report is conceptual rather than technical, commenting on general issues rather than details, although work on such details is important and available (e.g., see other publications from symposia sponsored by the Agency for Healthcare Research and Quality). Also of note, comments regarding particular articles are not intended to be ad hominem toward the authors, but they are rather part of a scientific discussion to advance understanding.
TERMS DESCRIBING SCIENTIFIC RESEARCH
Randomized controlled trials (RCTs) involving human participants are a type of research design within the domain of patient-oriented research. The term patient-oriented research itself has official and unofficial definitions, but herein it refers to scientific investigations involving an “intact” person (not only his or her organs) or collections of persons, such as when evaluating health care delivery systems. Patient-oriented research is often viewed informally as an alternative to “basic science,” but it is a fallacy that basic science is limited to or synonymous with laboratory or so-called wet-bench research. Rather, the academic disciplines of clinical epidemiology and biostatistics are basic sciences of patient-oriented research in that they represent a search for knowledge itself. Subsequent applied science uses such knowledge to search for solutions to practical problems; an example in the domain of patient-oriented research is when studies of comparative effectiveness use available methods of epidemiology (e.g., various structures of randomized trials or other designs) and statistics (e.g., strategies of multivariable analysis or alternative analytic techniques).
Contemporary discussions of randomization inevitably encounter the paradigm of EBM. As described originally, EBM “de-emphasizes intuition, unsystematic clinical experience, and pathophysiologic rationale as sufficient grounds for clinical decision making and stresses the examination of evidence from clinical research.” In evaluations of the aforementioned clinical (patient-oriented) research, randomized trials were deemed a higher grade of evidence than were non-randomized observational studies.
Although subsequent revised definitions of EBM have modified this perspective, hierarchies of research design have been adopted and have become entrenched. Figure 1 shows a representative example, with systematic reviews as the highest form of evidence, followed by RCTs, cohort and case–control studies, case series and case reports, and last, opinions.
If the medical literature is systematically evaluated, however, it is apparent that major problems exist with research hierarchies, as will be discussed later. Although randomized trials have inherent advantages in minimizing bias, the limitations of RCTs are often overlooked, and the ability of observational studies to account for bias is regularly discounted.
Several years after EBM's launch, its problematic influence was described as follows: “Although advocates of EBM acknowledge the contribution of all forms of evidence, the differential value attached to different sources has led to naïve and simplistic attempts to omit the traditional processes of interpretation, synthesis, and extrapolation, and to draw wide-ranging conclusions from trial data without adequate scientific discussion.” Another perspective was that EBM proponents had defined “EBM in an overly broad, indeed almost vacuous, manner.” An analysis of this debate follows, after brief consideration of other topics.
COSTS AND ETHICAL ISSUES
The infrastructure required for a randomized trial is typically more expensive than that of alternative designs for reasons beyond the scope of this article. Estimating the cost of new drug development: Is it really $802 million? was the title of an article evaluating and endorsing a prior publication that itself had reviewed randomized trials done from 1989 to 2002. Considered together, these reports estimate costs of RCTs in the range of $500 million to $2 billion dollars per drug developed—not including marketing or other expenses.
Evaluating all questions that arise (for all drugs, devices, and programs) with RCTs would therefore be enormously expensive and even prohibitive. This viewpoint had been articulated more than 25 years ago—as in the assertion that randomized trials cannot “answer all the clinical questions that will arise in the future for a burgeoning diagnostic and therapeutic technology”—yet the influence of EBM continues to hinder a rational evaluation of the trade-offs involved. Indeed, rigid demands for conducting RCTs before decisions are made, or before actions are taken, have been referred to as creating problems of “RCT-myopia” and “evidence-based paralysis.”
As a brief acknowledgment of ethical considerations involving RCTs, a central dilemma involves whether it is ethical to offer patients treatments selected by chance. The standard response is that random assignment is ethical when equipoise exists, that is, when the relative therapeutic merit of the drugs (etc.) being compared is the subject of professional uncertainty. Yet, an article focusing on equipoise articulated several problems in this regard.
First, an argument can be made that the definition of equipoise is imprecise; it is difficult to describe exactly how therapeutic uncertainty is determined. Second, if the answer relies on expert opinion, then such opinion is often criticized (especially according to the teachings of EBM!). A third problem involves the limitations of evaluating whether a drug works—such as when surrogate outcomes are used. (For example, when are we justified in allowing microalbuminuria to substitute for end-stage renal disease? Assuming, hypothetically, that several trials show differences in microalbuminuria among treatment options, then are future similar trials ethical?) A fourth argument questioning equipoise notes that treatments can be expensive, yet costs are usually not included in the definition or application of equipoise. Finally, premature termination of RCTs—invoked by so-called stopping rules—can overestimate treatment effects and thereby influence judgments of equipoise.
Combining issues of resource limitations and ethical dilemmas, pragmatic and philosophical concerns can be raised independently regarding randomized trials. Nonetheless, the main focus of researchers is usually scientific issues related to randomization itself.
“Evidence” regarding randomized and observational studies (Table 1)
Table 1. Evidence contradicting a hierarchy of research design
RCT = randomized controlled trial.
Limitations of randomized trials
• Randomized trials on the same topic are often contradictory.
• Meta-analyses and large RCTs often disagree.
• Strategies can strengthen observational methods.
• Observational studies and RCTs often agree head-to-head.[19-23]
• Non-randomized treatments are effective and safe over time.
Representative examples from an extensive and consistent literature on research design indicate that: RCTs on the same topic are often contradictory (as was shown as early as the 1980s); meta-analyses and large RCTs on the same topic often disagree (despite their gold-standard status); and RCTs have limited generalizability (shown in numerous studies, including selected papers highlighting how trials in cardiology usually exclude patients with renal disease or how thrombolytic therapy for stroke was found to be effective in randomized trials but harmful in clinical practice and problematic in terms of medicolegal implications).
Conversely, the literature points out various strengths of observational studies: specific strategies can be used to protect the validity of observational methods (with reports from more than 20 years ago); multiple articles show that well-designed observational studies and RCTs often agree when compared head to head;[19-23] and evidence indicates that treatments approved by non-randomized data in the pre-EBM era are effective and safe (including numerous oncology drugs that have survived the test of time and are still in clinical use).
Threat of confounding
The purported superiority of randomized trials is based mainly on their avoidance of the problem of confounding, which is a potential source of bias in observational studies. A prospective or real-time infrastructure—a necessity in randomized trials—also serves to promote the collection of high-quality data, but this advantage (over retrospective, but not all prospective, observational studies) receives less attention.
Confounding can occur when an extraneous factor is associated with both exposure and outcome. In a simplified example, the average age of people living in Florida (low latitude) tends to be higher than that in Alaska (high latitude), and age is certainly associated with mortality (e.g., deaths per 100 000 population per year). Accordingly, and in a hypothetical context, if an observational study were to assess whether latitude is associated with mortality without considering age differences, then confounding due to age would cause spurious results.
A randomized trial (albeit impractical to randomize people in terms of where they live!) would be less vulnerable because randomization would balance factors, including age, that affect outcome. Accordingly, one viewpoint in dealing with confounding is to conduct only randomized trials when assessing therapeutic options, an approach consistent with the tenets of EBM. Alternatively, novel methods have been proposed—including applications[25, 26] of propensity scores and instrumental variables—that are sometimes said to mimic randomization. Such methods are valuable, but when they are viewed as “simulating” randomization, as if randomization always guarantees useful research, then corresponding limitations are more prone to be ignored.
Another approach to minimize confounding is to use traditional observational methods more rigorously, although this option is currently underappreciated and underutilized. In particular, two reasons exist to suggest that potential bias can be mitigated or is not always a threat.
First, and most importantly, confounding can be addressed by understanding the basic science elements involved in general, as well as in each specific research context. The greatest challenges are usually identifying clinically relevant factors that are related to the intervention and outcome and collecting high-quality data for those factors. Based on the research question addressed, the study design utilized, and the actual data collected, various strategies—including matching, standardization, or adjustment with mathematical models—can then account for confounding, including age differences in the previously cited (simplified) example. The challenge is more difficult when using archived or database information, but methods exist to help address such situations.
Second, the threat of unknown confounding factors is exaggerated. For example, if a factor is imbalanced with regard to treatment but does not affect the outcome, or the converse, then confounding does not occur. (Of note, randomized trials often have such imbalances, unless the trial is very large, because randomization guarantees random but not equal allocation.) Alternatively, if a confounding factor is known to the patient or provider but not the researcher in an observational study, then it is not actually unknown; rather, it is unmeasured and in need of better basic science work (as mentioned above) on the clinical topic involved. If a potential confounding factor is truly unknown, however, then clinicians cannot intervene preferentially based on that factor, and no bias is introduced.
Need for MBE
Beyond addressing confounding more rigorously, a revised and more cogent strategy is desirable when evaluating studies overall. The current approach, endorsed by EBM, considers validity as the primary issue—whether results are “true” for patients enrolled in a study—and therefore randomization is considered paramount. Subsequently, and of secondary concern, consideration is given generalizability—whether the results are applicable to a broader population of patients with the condition.
The emphasis in MBE is instead on accuracy—defined for this purpose as whether results are true for patients who would receive the exposure or intervention—and therefore randomization is relevant, but not the only important issue. Specifically, assessment of a study using a MBE approach considers validity and generalizability together.
In implementing MBE, researchers should focus systematically on domains of baseline characteristics of patients, exposure in terms of primary and cointerventions, and aspects of the outcome. These elements should be thought of as having equal relevance and importance to concealment of allocation or other criteria, as promoted by EBM. The domains of MBE are, of course, evaluated already in current research practice—but most often informally rather than systematically.[29, 30]
Both the pitfalls of EBM and the potential of a MBE approach have been discussed[29, 31-33] previously—without using the phrase MBE—including evaluation of studies assessing outcomes of women taking hormone replacement therapy. In summary, criticism is often made that observational studies suggest benefits from hormones, whereas RCTs suggest either no benefit, or harm. The simplistic but mistaken premise is that RCTs were correct but observational studies were flawed. In reality, a methodological consensus has developed[35-37] that neither observational nor randomized studies held superior “truth” regarding hormone replacement therapy.
A more recent controversy involves screening for prostate cancer. As evidence of the strong influence of research design hierarchies, a review of screening for prostate cancer by the US Preventive Services Task Force in 2002 considered case–control studies, but as of 2008, the updated recommendations summarily excluded such studies. The stated reason was to “avoid potential sources of confounding that are inherent in nonrandomized studies”—as if a mystical force called confounding would cause harm, the way miasma was once thought to cause disease. Yet, a meta-analysis on this topic included a trial with severe flaws, following the EBM practice of ignoring high-quality observational studies while considering any and all RCTs.
The published data on this topic, from both case–control and randomized studies, are overlapping in terms of point estimates, but the two most prominent RCTs have discordant results. In brief, a trial conducted in the USA did not find a benefit of screening with prostate-specific antigen in reducing cause-specific mortality. In contrast, a trial conducted in Europe did find a benefit. Although the details of these two studies have been scrutinized, the emphasis in the current EBM era is mainly on finding “randomized proof” that screening works, discounting the methodological reality that any one study may not be definitive.
An MBE perspective would evaluate available studies systematically according to the criteria shown in Table 2 and would ask the following: In deciding whether screening for prostate cancer is effective, do we know for whom, performed how, and for what outcome exactly? The quality of each study, whether randomized or not, should be assessed, but we should also rigorously consider who the patients were in terms of age and comorbidity, exactly how screening was done, and what endpoints were and were not assessed.
Table 2. Foci for evaluating accuracy of randomized and observational studies, using a medicine-based evidence approach
Example: screening for prostate cancer and mortality
PSA = prostate-specific antigen.
- Indolent versus aggressive cancer
- PSA alone (vs. velocity, density, etc.)
- With or without digital rectal exam
- Compared with usual care or other comparator
- Overall versus cause-specific mortality
- Duration of follow-up
- With or without consideration of morbidity
As emphasized, the quality of any study is a crucial consideration when judging what it puts forward as evidence. (Discussing the many aspects of quality is beyond the scope of this report, but available checklists[46, 47] are more useful as guides than as definitive assessments.) In addition, the concepts shown in Figure 2 can help informally in recognizing patterns of study design as used in research practice.
In Figure 2a the y-axis shows observational versus experimental design, to help evaluate issues of (internal) study validity; experimental studies refer to RCTs, whereas observational studies include descriptive and analytic (e.g., cohort or case–control) studies. The x-axis shows a spectrum ranging from limited to expansive clinical characteristics of patients and outcome variables to help determine generalizability; limited refers to a restricted patient population and surrogate outcomes, whereas expansive refers to broader inclusion criteria and prominent outcomes. The four quadrants (I–IV) serve to classify and organize common types of research studies, providing a context for evaluating clinical and methodological issues without invoking a research design hierarchy. In all four quadrants, clinical features of the primary intervention and cointerventions are also relevant (e.g., difficult to deliver vs. representative of current practice), whether administered in an observational or randomized context.
In real-world research practice, represented in Figure 2b, the lower left (Quadrant III) includes randomized trials described typically as evaluating efficacy—whether an intervention works under more controlled conditions, such as enrolling patients without comorbid ailments or evaluating surrogate outcomes. The lower right (Quadrant IV) includes randomized trials evaluating effectiveness, such as large, simple trials—with, for example, broad inclusion criteria or mortality as an outcome.
Rather than imposing a hierarchy of research design, MBE instead gives observational studies equal consideration. The upper left (Quadrant I) highlights descriptive studies of deterministic phenomena and would include the landmark demonstrations of insulin lowering blood glucose and penicillin treating infection. Finally, the upper right (Quadrant II) includes cohort or case–control studies evaluating the benefits or harms of interventions.
Figure 2 does not include any new information; rather, the concepts serve to counter the current mindset that RCTs are always necessary or provide definitive findings. For example, RCTs in the lower left quadrant can demonstrate the efficacy of an intervention in a given context, but we should not expect them to provide flawless results for a broad population of patients. Conversely, rigorous observational studies in the upper right quadrant are currently underutilized in examining benefits of the same medical interventions, although evaluating harm is usually viewed as more acceptable. Specifically, and depending on the investigator, reviewer, or funding agency, studying benefits using observational methods in CER is unfortunately not always deemed appropriate.
Poorly designed observational studies, or any thinly disguised pseudo-research, should not be defended by arguments made in this report; conversely, randomized trials have contributed greatly to clinical medicine. Nonetheless, in the name of advancing science by removing bias (confounding), EBM has ironically overseen the development of an intellectual bias. Representative comments in this regard mention that conducting more observational studies of therapeutic interventions would bring the threat of “considerable dangers to clinical research and even to the well-being of patients” and “perhaps [evidence-based medicine has not] tried hard enough to convert the skeptics [to believing in the necessity of randomization].”
An open debate would consider opinions—including strong arguments—that question the supremacy of randomization. For example, “it is in fact very difficult to see any cogent reason for thinking as highly of RCTs as the medical community does”; “none of [the arguments made in favor of randomization] supplies any practical reason for thinking of randomization as having unique epistemic power”; and “it would seem better to try to convince the medical profession of [the lack of need for RCTs], rather than turn their delusions into an argument for pandering to them!”
Another writer opined that, “The notion that evidence can be reliably or usefully placed in ‘hierarchies’ is illusory. Rather, decision makers need to exercise judgement about whether (and when) evidence gathered from experimental or observational sources is fit for purpose.”
In summary, the EBM approach overlooks the limitations of randomized trials and undervalues the contribution of observational studies. Results of both have been shown to provide valid and useful information. A focus on the accuracy of results, from studies of various design architecture, is warranted. Accordingly, to help improve health care and patient outcomes, CER requires less EBM and more MBE—that is, more scientific and clinical judgment.