• Open Access

A CTSA Agenda to Advance Methods for Comparative Effectiveness Research


  • Mark Helfand M.D., M.S., M.P.H.,

    1. Oregon Clinical & Translational Research Center, Oregon Health & Sciences University and Department of Hospital and Specialty Medicine, The Portland VA Medical Center, Portland, Oregon, USA
    Search for more papers by this author
  • Sean Tunis M.D., M.Sc.,

    1. Department of Medicine, John Hopkins School of Medicine, Baltimore, Maryland, USA
    2. Center for Medical Technology Policy, Baltimore, Maryland, USA
    3. Department of Surgery and Institute for Health Policy Studies, University of California San Francisco, San Francisco, California, USA
    4. Center for Health Policy, Stanford University, Stanford, California, USA
    Search for more papers by this author
  • Evelyn P. Whitlock M.D., M.P.H.,

    1. The Center for Health Research, Kaiser Permanente Center for Health Research, Portland, Oregon, USA
    2. Oregon Evidence-Based Practice Center, Oregon Health & Sciences University, Portland, Oregon, USA
    Search for more papers by this author
  • Stephen G. Pauker M.D.,

    1. Tufts Clinical and Translational Science Institute and Department of Medicine Division of Clinical Decision Making, Informatics and Telemedicine, Department of Medicine, Institute of Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
    Search for more papers by this author
  • Anirban Basu Ph.D.,

    1. Departments of Health Services and Pharmacy, University of Washington, Seattle, Washington, USA and The National Bureau of Economic Research, Cambridge, Massachusetts, USA
    Search for more papers by this author
  • Jon Chilingerian Ph.D.,

    1. School for Social Policy and Management, The Heller School, Brandeis University, Waltham, Massachusetts, USA
    Search for more papers by this author
  • Frank E. Harrell Jr. Ph.D.,

    1. Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
    Search for more papers by this author
  • David O. Meltzer M.D., Ph.D.,

    1. Center for Health and the Social Sciences and Hospital Medicine, The University of Chicago, Chicago, Illinois, USA
    Search for more papers by this author
  • Victor M. Montori M.D.,

    1. Healthcare Delivery Research Program, Translating CER Core, Mayo CTSA and Department of Health Sciences Research, Division of Healthcare and Policy Research, Mayo Clinic, Rochester, Minnesota, USA
    Search for more papers by this author
  • Donald S. Shepard Ph.D.,

    1. Schneider Institute of Health Policy, The Heller School, Brandeis University, Waltham, Massachusetts, USA
    Search for more papers by this author
  • David M. Kent M.D.,

    1. Tufts Clinical and Translational Science Institute, Department of Medicine, Institute of Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA.
    Search for more papers by this author
  • The Methods Work Group of the National CTSA Strategic Goal Committee on Comparative Effectiveness Research

Mark Helfand (helfand@ohsu.edu)


Clinical research needs to be more useful to patients, clinicians, and other decision makers. To meet this need, more research should focus on patient-centered outcomes, compare viable alternatives, and be responsive to individual patients’ preferences, needs, pathobiology, settings, and values. These features, which make comparative effectiveness research (CER) fundamentally patient-centered, challenge researchers to adopt or develop methods that improve the timeliness, relevance, and practical application of clinical studies.

In this paper, we describe 10 priority areas that address 3 critical needs for research on patient-centered outcomes (PCOR): (1) developing and testing trustworthy methods to identify and prioritize important questions for research; (2) improving the design, conduct, and analysis of clinical research studies; and (3) linking the process and outcomes of actual practice to priorities for research on patient-centered outcomes. We argue that the National Institutes of Health, through its clinical and translational research program, should accelerate the development and refinement of methods for CER by linking a program of methods research to the broader portfolio of large, prospective clinical and health system studies it supports. Insights generated by this work should be of enormous value to PCORI and to the broad range of organizations that will be funding and implementing CER. Clin Trans Sci 2011; Volume 4: 188–198

Background and Purpose

Comparative effectiveness research (CER) generates evidence to inform patients, providers, and other decision makers, responding to their expressed needs for comparative information about which interventions are most effective for which patients under specific circumstances. CER is fundamentally patient-centered in that it seeks evidence that will support decision making that is responsive to individual patients’ preferences, needs, pathobiology, setting, and values.1 CER addresses information gaps that target important underlying uncertainty in clinical practice, defined through active dialogue with the clinicians and patients who are the intended end users of this research. The deliberate and sustained attention to and involvement of decision makers is the characteristic of CER that most clearly distinguishes it from much past clinical and outcomes research.2

In their reports on CER, the Institute of Medicine (IOM) and the Federal Coordinating Council highlighted the importance of establishing new methods and infrastructure to help make CER more valid, relevant, generalizable, efficient, and feasible (see Appendix).3,4 Each of these objectives derives directly from the primary purpose of CER—to produce information that is useful for making clinical and health policy decisions.

Developing methods that can increase the relevance, generalizability, efficiency, and feasibility of health research while adequately preserving internal validity is a central challenge of the emerging CER enterprise. Through the Clinical and Translational Science Awards (CTSA)program, the National Institutes of Health (NIH) is well positioned to provide leadership in the development of these new methods, in part because the CTSA program already has major responsibility for designing and implementing health research to advance public health. The unique importance of the health research performed by the CTSA program provides the motivation and the obligation to develop and apply new methods that are equal to the important challenges that have been identified.

While it is clear that the CTSA program has both the capacity and the responsibility to advance the methods needed to perform CER, the portfolio of methodological issues requiring attention has not yet been articulated. This paper identifies and briefly describes an initial set of methodological issues that were identified as high priority by a working group of CTSA representatives. Because information needs of patients, clinicians, and payers are dynamic, CER and the methods to conduct CER will necessarily evolve over time. The methods issues discussed below are intended to serve as a starting point for discussion, rather than as a final prioritized agenda of methods inquiry for the CTSA program. Within the CTSAs and across scientists, there are differences of opinion about the importance of, or optimal approach to, many of these topics, and it is unlikely that consensus will be rapidly achieved. However, the CER methods working group believes that there is an urgent need for a forum within the CTSA program to support ongoing methodological research and dialogue on these and related methods topics. More sustained, coordinated, and collaborative attention to these topics is essential to ensure that the CTSA program develops and applies methods that contribute meaningfully to the CER enterprise, and through this work to improve clinical and health policy decision making.

Characteristics of CER

Although CER overlaps with other types of clinical research, it has several distinguishing characteristics.3,5,4 Specifically, CER:

  • 1) meaningfully involves patients, the public, clinicians, payers, policy makers and other relevant decision makers in prioritizing topics and developing questions for study;
  • 2) compares viable alternatives that each represents a potential standard of care;
  • 3) evaluates a comprehensive array of health-related outcomes—both benefits and harms—not just a likely benefit or advantage;
  • 4) includes diverse patient populations and individualizes results, making research applicable to subgroups and individuals; and
  • 5) keeps research up to date.

The disciplines that underlie CER, such as clinical epidemiology, evidence synthesis, decision sciences, shared decision making, guideline development, outcome measurement, and implementation sciences, are not unique to this type of research. Yet these disciplines, at present, do not have at their disposal a clearly articulated plan or path for conducting research with explicit attention to the unique objectives of CER. In order to reach that goal, the following three critical needs must be met:

  • Critical Need 1 is to develop and test trustworthy methods to identify and prioritize important questions for research about the comparative risks and benefits of health care choices.

  • Critical Need 2 is to improve the design, conduct, and analysis of clinical research studies.

  • Critical Need 3 is to implement measurement and feedback in clinical practice to improve clinical and health policy decision making and link the process and outcomes of actual practice to CER priorities and results.

To develop and apply methods that directly address these critical needs, we propose 10 priority areas for methodological research that are critical to the ability of the CTSA program to contribute to the emerging CER enterprise. Each of these areas is pertinent to one or more of these critical needs (Table 1). Below, we discuss the rationale for each methodological area and its role in CER.

Table 1.  Methodology priorities for comparative effectiveness research.
1. Test methods to involve consumers, the public, and clinical decision makers in identifying priorities for researchCritical Need 1: Methods to identify and prioritize important questions for research about the comparative risks and benefits of health care choices.
2. Test methods to conduct systematic comparative effectiveness reviews and keeping them up to date.
3. Expand and test methods to use of systematic reviews alongside modeling to estimate the value of comparative effectiveness research (CER) studies and prioritize a clinical research agenda.
4. Conduct research to define the role of different categories of CER in the research cycle.Critical Need 2: Improve the design, conduct, and analysis of clinical research studies.
5. Develop methods to better address the heterogeneity of treatment effects.
6. Increase the efficiency of comparative effectiveness trials.
7. Incorporate preferences, value, resource use, and utility into the design of clinical research studies.
8. Evaluate statistical, analytical, epidemiological, and logistical methods for pragmatic trials.
9. Improve methods to recruit and retain patients/populations excluded or underrepresented in current trials and other research.
10. Evaluate the optimal and timely translation of CER results.Critical Need 3: Improve decision making, linking practice to the CER agenda.

There is unlikely to be another federal research program that is equally capable of taking on these issues, and for that reason, attention to this work is not only desirable for the CTSA program, it is essential to the success of CER.

CER Methods Priorities for CTSAs

Critical Need 1: Methods to identify and prioritize research questions about the comparative risks and benefits of health care choices.

1. Methods to involve consumers, the public, and clinical decision makers in identifying priorities for research. A basic observation underlying interest in CER is that existing research often fails to address questions that underlie uncertainty in practice.2 CER therefore invokes the principle that important questions arise from practice, where “practice” is defined broadly as the experience and circumstances of those who have a condition or care for those with a condition or, for preventive services, might be a candidate for an intervention.6 Involvement of patients, caregivers, and clinicians can help identify patient-important outcomes (including risks as well as benefits, and utilities and preferences related to outcomes), meaningful comparisons, important subgroups that may moderate outcomes, and broad and diverse populations of interest, as well as characteristics that may be important mediators of outcomes.7 In addition, such involvement can also help to avoid bias in designing a study—for example, by ensuring that a study is designed to evaluate the balance of benefits and harms, not just a likely benefit or advantage, from several perspectives.

CER has begun to engage the public in identifying and prioritizing research questions.3,8,9 The direction within the research community is to engage those not typically part of the research cycle. For example, networks involve practitioners in identifying and answering real-world questions through practice-based research studies, and participatory research initiatives10 engage communities from start to finish in identification, prioritization, conduct, interpretation, and dissemination of research.11–13

The Agency for Healthcare Research & Quality (AHRQ) Effective Health Care program is conducting methodological and demonstration projects to better define how the public should be engaged in CER, the best time points for engagement, and the means to ensure adequate preparation and experience in the process. Methods in this area involve an expanded notion of “community engagement,” a cornerstone of the CTSAs, in which community practitioners and members of the public are invited to become more deeply engaged in the early phases of framing research questions, research design, rather than engaging them primarily to enhance recruitment of patients to trials that have been designed by others.

These efforts focus on revealing issues that matter to decision makers and incorporating them into the design of CER to improve its usability and patient-centeredness. Engaging decision makers in early stages of the research process can help produce findings, which are more relevant to practical decisions, encourage public transparency, and empower citizens.14,15 While a range of methods have been undertaken to engage the public, only a few studies have compared alternative approaches, and most of these have addressed priorities in the provision of health services as opposed to research prioritization.16 One promising method that has been used primarily in a variety of settings has been dubbed “citizen forums” or “juries.”17–19 This approach seeks to emulate the use of a jury (of peers) in a legal situation, rendering, in this case, a judgment about important health care priorities.20 Another approach is to include consumers as full members of a stable team of researchers and policy makers.16 Both approaches enable lay individuals the opportunity to spend sufficient time to understand the technical issues underlying a complex decision or group of decisions so that they can be full participants, rather than consultants, in priority-setting.

2. Methods for conducting systematic comparative effectiveness reviews and keeping them up to date. Systematic reviews play a critical role in CER, serving as an essential step in translating research findings and identifying gaps in the evidence that should inform future research studies. Over the past few years, systematic review methods have been updated with the goal of making these reviews more responsive to public input, more transparent, and more focused on the specific clinical and health policy decisions faced by patients, clinicians, and payers.21

Explicit, defensible, consistent methodology for systematic reviews is essential because they are used by patients, clinicians, health plans, and practice guideline developers to make important decisions. In 2002, the states of Oregon, Washington, and Idaho began using systematic drug class reviews in preferred drug decision making.22 These reviews incorporated several features, such as public involvement in identifying and refining questions, attention to both benefits and harms, and public review of draft research papers that later became widespread principles for conducting CER. The program gave rise to relatively early critiques describing the need for improved methods for systematic reviews23,24 and alerted researchers and funding agencies of the need for a methods research agenda.6 In 2004, the Cochrane Collaboration Methods Group newsletter published a brief summary of the first comprehensive list of research priorities for comparative effectiveness reviews.25 This list emphasized the need for better methods to identify characteristics of clinical trials that provide useful information about applicability and to make inferences from indirect evidence. An AHRQ-supported report affirmed the need for better methods for assessing a study’s limitations and a body of literature’s clinical relevance and overall quality26 and, in 2005, a series of articles from the Evidence-based Practice Centers identified weaknesses in systematic review methodology.27

Beginning in 2005, AHRQ posted comparative effectiveness reviews and guides for clinicians and patients using them to a public Web site. Comments on the reviews by stakeholders identified weak spots in the methodology and led to development of guidelines improving consistency in conducting and reporting the research. These public comments have also helped formulate an evolving methods research agenda within the AHRQ’s Effective Health Care program.21

Since 2007, the AHRQ has embarked on an ambitious program of methodological research to improve the scientific basis of evidence synthesis and, in particular, to improve the efficiency of searching for, identifying, and abstracting data from published articles; to update reviews; and to enhance the suitability of comparative effectiveness reviews for identifying priorities for decisions about future research. The same period has seen tremendous growth in methodological research in evidence synthesis throughout the world, led by international collaborations such as the Cochrane Collaboration, the GRADE Working Group, and the EQUATOR network, and by national health technology assessment agencies in the United Kingdom, Australia, and Canada. In May 2011, the Institute of Medicine will release standards for conducting systematic reviews of clinical effectiveness and priorities for additional methodological research in literature synthesis.

3. Use of systematic reviews alongside modeling to estimate the value of CER studies and prioritize a clinical research agenda.

The primary goal of CER is to generate new information about the comparative benefits of alternative technologies for a specified group of patients. Understanding and quantifying the value of this new information is essential for proper allocation of resources in this area. Information generates value only when certain decisions may be altered based on that information. Therefore, the value of CER is linked to improving treatment choice decisions that will in turn improve clinical outcomes.

In essence, one can view value of information analysis as a comparative analysis of CER itself. That is, it entails comparison of two scenarios: 1) treatment choices that patients or their clinicians make today without additional CER information and 2) potential choices they would make if new CER information becomes available. Either of the chosen scenarios could be linked to clinical outcomes and costs and the difference between them would generate an estimate for the value of comparative effectiveness information.

Value of information analysis is particularly suited for the CTSA program as translation of research evidence for the betterment of treatment choices lies at the core of generating value.28 CTSA researchers can use these techniques to study a wide range of questions regarding how to target CER within certain clinical areas, the barriers to translation or implementation of new evidence, and whether generalizability of current evidence is weakened due to heterogeneity in the case-mix.

As shown in Table 2, varying concepts of value of information analysis can help with improving decision making within CTSA programs. The expected value of perfect information (EVPI) can be calculated using the probability that certain treatment choices based on current knowledge are suboptimal in terms of patient welfare and the potential welfare gain among patients that could be realized if these decisions can be remedied (with perfect information on comparative effectiveness). EVPI directly produces a maximum value of a CER study and relies on the fact that inefficient treatment choices are remedied by perfect translation of perfect evidence. In practice, however, all CTSA programs engage in varying levels of efforts to improve translation. One can form an estimate for the expected value of perfect implementation (EVPIM) or the expected value of partial perfect information to quantifying the potential value of such efforts.29EVPIM is given by the difference between EVPI and the plausible EVPI with imperfect translation. An estimate for the maximum value of research (MVR) can be obtained using burden of illness-type studies, which informs the upper bound of research for a clinical area30 and provides the maximal value of improving clinical outcomes for a patient population. Finally, the expected value of individualized care can be invoked to quantify the value of learning about heterogeneity in treatment effects versus learning about the average treatment effect more precisely.31 These concepts have been expanded to strengthen the methodology for considering the value of a portfolio of proposed studies rather than only for individual studies.32

Table 2.  Types of value of information analysis relevant to comparative effectiveness research.
Expected value of perfect information (EVPI)Determine the maximum value of a new research study in the context of decision making.
Expected value of sample informationDetermine optimum sample size and allocation rates in randomized clinical trials.
Expected value of perfect implementationDetermine maximum value if implementing new research findings.
Expected value of partial perfect informationIdentify parameters that contribute most to the EVPI and parameters that may be disregarded as targets for further research.
Plan the design of sequential research studies.
Maximum value of researchCompare the value of new research with the value of strategies to change the level of implementation.
Expected value of individualized careCompare the value of individualized treatment decisions (taking preferences into account) with the value of treatment decisions based on traditional population-level analysis.

Overall, this ensemble of value of research methods provides a systematic, theoretically grounded, and methodologically rigorous tool to assess priorities for CER within the CTSA and broadly across most other national decision making entities. It can not only help in prioritizing the CER portfolio of today, but can provide a transparent conduit to assimilate the information generated by today’s CER to prioritize for the next rounds of research.

Summary for Critical Need 1

The past few years have been characterized by intense interest in improving the process for identifying and refining priorities for CER. CER envisions that the selection and design of new studies will be informed by the results of synthesizing existing evidence and applying it in practice. At present, however, methods for adapting systematic reviews for this purpose are poorly developed, and there is no agreement on methods to recruit patients or members of the public and involve them in deliberations about research. While there are a few examples of using systematic reviews, public involvement, and value of information methods effectively to develop an agenda for future research, substantial methodological development will be needed for these mechanisms to fill the role of guiding future research.

Recently, the directors of NIH and AHRQ noted their agencies’ particular expertise in conducting original research studies and evidence syntheses, respectively.33 As a postscript, we note that both agencies, through the CTSAs, have an important role to play in advancing literature synthesis methods. Literature synthesis is a type of translational research that can influence the selection and design of studies and activities across the spectrum of CER. To act effectively in closing gaps in methodological knowledge in the area of literature synthesis, community-based researchers, trialists, and experts in observational studies must collaborate with systematic reviewers.

Rather than keeping these functions separate, connecting centers of methodological expertise in clinical research, community-based research, and systematic reviews is more likely to advance these fields. Often, for example, incorporating special study design features and data elements into clinical studies is the best approach to validate novel methods for conducting literature syntheses.

Critical Need 2: Improving the design, conduct, and analysis of clinical research studies.

4. Evaluate the role of different categories of CER (evidence synthesis, decision analysis, practice-based clinical trials, registries, outcome studies using existing databases, health services research studies, economic studies) in the research cycle.

Since the term “CER” entered the political lexicon, it has generated vigorous discussion of the role of different types of studies in informing stakeholders about the comparative effectiveness of different medical tests and treatments.23,34,35 The complementary role of different types of evidence in making decisions is widely recognized as an important part of evidence-based medicine.36 A fundamental principle of CER is that different types of studies are needed to evaluate comparative effectiveness and to make comparisons relevant to patients, consumers, and other decision makers.

The question of the appropriate roles of different types of evidence is anything but new. In 1983, Alvan Feinstein put forth the argument for a broad CER agenda:

Although highly successful in investigating remedial therapy, randomized clinical trials have sometimes created rather than clarified controversy when the treatments were given for the complex problems involved in studying either the primary prevention of disease or the secondary prevention of adverse progression for an established disease. Another source of difficulty has been the inevitable conflicts created by two legitimate and justifiable but opposing policies regarding the fastidious or pragmatic goals of the trials. These problems limit the scope of clinical questions that can be answered successfully by randomized trials, but other limitations are produced by problems in logistics or ethics. Randomized trials are unfeasible for studying multiple therapeutic candidates, minor changes in therapy, “instabilities” due to rapid technologic improvements in available treatment, long-term adverse effects, studies of etiologic or other suspected “noxious” agents, and the diverse clinical roles of diagnostic technology. Consequently, despite the magnificent scientific achievements of randomized clinical trials, the foundation for a basic science of patient care will also require major attention to the events and observations that occur in the ordinary circumstances of clinical practice.37

In 1997, David Sackett and John Wennberg, seen as founders of “evidence-based medicine” and “evaluative sciences,” respectively, teamed up to respond to a similar debate about the roles of randomized trials versus outcomes research. They called for “choosing the best research design for each question,” and they described that arguments “comparing, contrasting, attacking, and defending randomized control trials, outcomes research, qualitative research, and related research methods… has mostly been…a waste of time and effort.138

As pointed out by Garrison and colleagues in a recent essay, it is helpful to distinguish between evidentiary standards and methodological standards for conducting research.39 The current debate about observational studies is focused on the standards of evidence used by regulators, guideline developers, and health plans to make decisions about market entry, insurance coverage, and practice recommendations. The debate is driven largely by anecdotes. Some of these illustrate how observational studies have been valuable in examining the effects of competing interventions in actual practice.40 Others show how, in some celebrated instances, relying on observational studies has yielded the wrong answer to fundamental questions of effectiveness, leading to wide adoption of ineffective or harmful practices.41

Unfortunately, this debate has done little to address the methodological needs related to the use of different types of research in CER, or to inform methodological standards for developing decision models or conducting observational and other types of comparative effectiveness studies. Many of the methodological research priorities described below (items 5 through 9) could improve the conduct, relevance, and logistics of both experimental and observational studies.

In addition, for nonexperimental methods (e.g., registries, observational designs), we believe a program of empiric research is needed to define the circumstances under which nonexperimental methods are informative, and how the internal validity of nonexperimental studies can be maximized. This program should test the robustness of new user designs, propensity scores, instrumental variables, and other strategies for avoiding or accounting for confounding by indication. Improving the validity of research designs, and developing standards to assess their validity, is most likely to emerge from integrating evaluation of novel clinical research methods into the CTSA’s clinical and community research programs.

5. Methods to better address the heterogeneity of treatment effects (HTE) in clinical studies.

Clinical research studies (not only randomized trials, but observational studies as well) often report only the average effect of compared treatments for the population studied. Clinicians must then decide how best to use those studies in diagnosing and treating individual patients. In many cases guidelines and recommendations are developed based on the studies. However, clinicians often prove unwilling to follow those guidelines, partly because they suspect their own patients differ from those in the studies. This suspicion is not unreasonable, as many clinical trials use patient samples that are unrepresentative of the broader population, due to exclusion of subjects with comorbidities, chronic conditions, and other factors.42 As a result, there has been pressure to make studies more applicable to the diversity of patients encountered in clinical practice, and in a broader range of clinical settings. For example, the National Institute for Mental Health has funded a series of “practical clinical trials” of medication effectiveness that were designed to better reflect real-world clinical practice, by applying fewer sample exclusions and tracking a broader range of outcomes over longer periods.43

Even in a study with a more diverse patient population, patients may differ from each other in ways that make the overall average treatment affect a misleading measure. Patients may respond differently to the treatment being studied, or experience different side, Jr. effects, or differ in their preferences, for example, regarding longevity/quality trade-offs. These differences are often described as HTE, and have been documented as important for various treatments.44,45 Sources of heterogeneity can include the patient’s baseline risk, genetic profile, and disease severity, among other factors. In some cases, a treatment that is beneficial on average can be harmful to a subset of the trial population, particularly once side-effects are considered. Conversely, a null finding for the whole sample could conceal benefits to subsamples.46

Although a framework for assessing and reporting HTE has been proposed,47 as yet there is no clear consensus on how best to cope with HTE in the context of clinical trials. A variety of methods are being developed to make randomized trials better able to address HTE. Subgroup analysis is the most commonly used approach. Many randomized controlled trials are only powered to detect the main effect of the treatment studied, not differential effects among subgroups. Also, statisticians and others are suspicious of post hoc subgroup findings, as they may occur by chance, and in many cases have been reversed in subsequent research.48 Researchers, therefore, have been making a strong case for more use of subgroup analyses that are specified beforehand and adequately powered. Developing and validating techniques to improve risk assessment and power are therefore high priorities for CER. One approach increasingly used in oncology trials is to precede an RCT with an initial screening phase, which identifies the subgroup most likely to respond, who will then form the sample for the main trial (an “enrichment” design).49,50

Over the past 5 years, there has been rapid development in potentially important methods to use decision models and simulation to incorporate information about heterogeneity.51 Nevertheless, methods to incorporate analysis of heterogeneity and to identify characteristics of patients that are associated with higher or lower susceptibility to benefits and harms of alternative treatments are underused and underdeveloped. Outcome models with baseline covariates can be used to estimate treatment effects for individual patients.52,53 Further development of methods for N of 1 trials could improve the applicability of trial results to individual patients. In recent years, a number of studies have used multi-crossover designs in which the same patient is repeatedly randomized among treatments. This gives stronger confidence in the validity of any effect detected, but reduces generalizability since the effect is specific to that patient.54 Using Bayesian models to combine the results of N of 1 trials across patients merits further development and validation.55

While engaging in the important problem of estimating differential treatment effects, we should not forget that most clinical trials do not use the available data effectively to estimate the average treatment effect. Covariate adjustment is one of the most advantageous and underutilized of statistical methods. Using this method to account for outcome heterogeneity across a diverse patient sample should be standard practice. Guidelines for doing so would be beneficial. Outcome models with baseline covariates can also be used to estimate treatment effects for individual patients.52,53

Studies using observational data offer some advantages in addressing HTE. First, they often involve more representative populations than RCTs, making results more generalizable. Second, sample sizes are often larger, particularly if secondary data are used, allowing greater power to detect subgroup differences in treatment effects. These differences can be used to develop individual predictions based on a patient’s own characteristics. For example, many clinicians compute the risk of heart attack for individual patients by applying regression coefficients derived from the Framingham heart study to each patient’s own age, smoking status, and other characteristics.56 The risk estimate can then be used to discuss risk reduction options with the patient.57 Many observational studies of other treatments and diseases result in regression coefficients that can be used this way.

However, observational studies are commonly accorded less weight than RCTs, due to their major weakness: without randomization, the apparent effects of a treatment may be due to confounding of treatment with other unmeasured variables. For example, the mortality benefits of cardiac revascularization may be overstated if: 1) physicians systematically select for the procedure candidates who are less frail and 2) frailty is not measured in the resulting data. A variety of statistical approaches are available to address the confounding problem in observational studies, and several are relevant to HTE. Many of these methods have been available for longer than some of the methods for refining RCTs. However, they face greater barriers to acceptance among clinicians, in part because it is difficult to evaluate how successfully these methods can correct for confounding in a particular study—that is, by how much they reduce the risk of bias inherent in the design of the study. As noted above, we support a program of empiric research to refine and validate methods for observational studies. Studies that directly compare results of an intervention in randomized and nonrandomized populations and use the results to address HTE can make a unique contribution to CER by reducing uncertainty.58

Adaptive study designs and incorporating preferences, value, and utility into the design of clinical studies also have the potential to better address heterogeneity.31 These topics are addressed in the following two sections.

6. Methods to increase the efficiency of comparative effectiveness trials, that is, to design and analyze trials in a way that permits adaptations based on preliminary or staged data collection. Adaptation refers to changes in study design (for example, changes in treatment protocol, timing of follow-up, and choice of measures) that may shorten the time needed to draw reliable conclusions, make the trial more relevant to “real-life” clinical situations, or offer the potential to measure the effectiveness of strategies that make greater use of individual-specific information, such as HTE. Trial adaptation methods include staged protocols, group sequential trials, and Bayesian adaptive trials.

Trial designs adapted to real-life decision making are increasingly common. Real-world treatment often involves a series of sequential decisions, in which a clinician starts with one treatment and then uses the results to make a second decision, for example, whether to switch treatments. Some recent trials have set out to compare these “adaptive treatment strategies,” for example, the VANQWISH trial of invasive management versus conservative medical management to treat non-Q-wave myocardial infarction.59 Another variant is to randomize subjects at each of the stages, rather than only at the start of the treatment process, in a “sequential multiple assignment randomized” trial.60

7. Methods to incorporate preferences, value, resource use, and utility into the design of clinical research studies.

One of the mainstays of CER, patient-centeredness, means that decisions about care must reflect differences in the patients’ pathobiology (e.g., stage of disease, comorbidities, demographics), in the patients’ social settings (so-called contextual variables),61–63 in the availability of clinical resources (e.g., northern Alaska may have different facilities and personnel than Chicago), and in patients’ preferences or values.64 For almost a half century, clinicians, theoreticians, and clinical investigators have mused about how such preference variations might be incorporated into a rational decision for a patient lying in front of the clinician. That problem is far from solved and continues to demand a vigorous research agenda. It can be divided into three parts: 1) how do we determine and measure a patient’s preferences? 2) how should those preferences be incorporated into a personalized recommendation for that patient? 3) how should such recommendations be presented and explained? Even now, some half century after research in this arena began, we have neither validated techniques nor recognition of the need for such. In all likelihood, CER and its tenet of patient-centeredness will highlight this need and may help develop research toward a set of such techniques.

As difficult as the task appears for the individual patient, it is even more complex when the need is applied to recommendations or guidelines that must be applied to populations composed of individuals with a broad spectrum of preferences. This challenge can be divided into two parts: 1) how do we measure preferences and preference variation across a population (or perhaps across many populations); and 2) how might that variability be incorporated into guidelines that are sufficiently practical to be useful? The work of Nobelist Kenneth Arrow established that using average preferences (utilities) is not (even theoretically) a viable solution.65 This conundrum—the development and application of preference-sensitive guidelines66 is a substantial theoretical and methodological challenge, worthy of a well-funded research agenda. So whether at the individual patient level or at the population level, CER will require substantial research (both practical and theoretical) to allow clinicians and policy makers to suggest choices that are sensitive to patients’ preferences or values. All such work will need to address preference elicitation and measurement, and preference incorporation into rational decisions.

Various researchers are examining how patient preferences regarding outcomes could and should affect choice among treatments, and how to elicit those preferences.57,67,68 This developmental work is thought-provoking, but, like much of the science that underlies CER, has not been connected with the main body of community-based and other clinical research supported by NIH. The state of the science in this area can be accelerated by integration with the CTSA collaborative.

The CER methods agenda should also improve methods to consider resource use. No one disputes that health care decisions take place in the context of limited resources, but many object to including costs in the scope of CER. Garber has noted first, that objections center on methodological concerns and, second, that “the deepest objections undoubtedly stem from anxiety about how policymakers and private payers would apply cost-effectiveness information.”69 Garber and others argue that awareness of cost does not necessarily mean reductions in the range of covered services or limitations on access—information about cost can and has be used to increase access and improve value.70,71 Moreover, even the strongest advocates of including cost-effectiveness information in CER recommend that “cost should never be used as the sole criterion for evaluating a clinical intervention” and should always be accompanied by the explicit, transparent consideration comparative health benefits and harms.72

Leaving aside the policy debate, to the extent that individuals care about it, cost is an essential part of health care decision making and is inextricably connected with the concepts of value and preference. For methodologists in the fields of decision making, improving methods to understand and measure resource use and resource limitations is essential if we are to improve methods preferences and values.

8. Statistical, analytical, epidemiological, and logistical methods for pragmatic trials. Pragmatic trials are clinical trials that are more consciously designed to be informative to patients, clinicians, payers, and policy makers. How can we systematically consider modifications to RCTs that are responsive to the information needs of patients, clinicians, payers and that generates evidence that is more reflective of “real-world” impact? Some examples include, broadening patient eligibility, including relevant comparators, lessening standardization of experimental and control interventions, and including longer-term and patient-reported outcomes.

The emphasis on hypothesis testing and false positive rates (type I error; alpha spending) has reduced the efficiency of clinical research. In contrast, flexible Bayesian techniques will likely produce more evidence, more quickly and at lower cost. A simple example may suffice. Consider a randomized trial in which the evidence at the planned study termination is equivocal. Increasing the sample size may make the difference between a noninformative and an informative study. Traditional frequentist approaches to extending the study, even when there are absolutely no changes in the protocol or patient population, require that a multiplicity adjustment be used, effectively down-weighting the initial wave of subjects. Bayesian updating on the other hand, would consider new data as merely updating the evidence from the initial wave, without complex adjustments. When the protocol changes, treatment arms are dropped, etc., frequentist solutions are even more problematic, whereas a Bayesian analysis uses the same statistical procedure throughout. The distinction between reverse-time probabilities of false positives (which require incorporation of investigator intentions) and forward-time posterior probabilities (which update evidence as frequently as desired, assuming that the prior distribution remained fixed and that no data were discarded along the way) is an important one. Bayesian approaches have great promise in speeding the launch of a clinical trial, by recognizing that clinical knowledge regarding dosing, patient inclusion criteria, and choice of comparators evolves over the course of the study.

9. Methods to recruit and retain patients/populations excluded or underrepresented in current trials and other research.

It is widely recognized that specific subpopulation of patients have been underrepresented in past clinical research, most notably patients with multiple significant comorbidities, ethnic and racial minorities, patients at the ends of the age spectrum, and rural populations. The successful implementation of CER will depend in part on significantly improving the methods and strategies employed to increase the proportion of study subjects from these neglected subpopulations. One of the primary defining characteristics of CER described by the IOM in their 2009 report is the need to generate evidence that is applicable to clinical decision relevant to individual patients and patient subgroups, and this cannot be achieved if there are consistent and predictable gaps in the range of patients enrolled in prospective clinical studies.3

As a starting point for further work in this area, it would be useful to conduct a systematic review of published literature and best practices related to documentation of the barriers to inclusion of historically excluded populations, as well as interventions intended to improve representativeness in clinical research. This would provide a helpful baseline understanding of what is already known in this domain, and where there are gaps in knowledge about barriers, or interventions to overcome them.

Building on this knowledge base, it will be necessary to begin a focused, systematic body of work to fill any gaps in knowledge about the barriers to enrollment of underrepresented patient populations, and to conduct well thought out evaluations of promising interventions intended to increase enrollment of minority, vulnerable, and historically underrepresented patient population in trials. It is critical that this work is pursued in a sustained and organized way, guided by the best available thinking on this topic. Many individual investigators have made efforts to improve patient recruitment in specific trials, but this has frequently been addressed as a secondary or tertiary aim associated within a study with primary aims that are not focused on improving patient recruitment.

It is likely that some strategies to increase the range of patients included in trials will be specific to certain clinical conditions. For example, there are specific challenges associated with studying those patients with chronic wounds who permanently reside in nursing homes, many of whom have severe dementia or other serious comorbidities that complicate their enrollment in clinical studies. The strategies necessary to overcome challenges to these studies may not generalize broadly to other clinical conditions or research setting, though it would be expected that recurring themes would be empirically documented as more research of this type was completed.

Serious consideration must also be given to the possibility that improving recruitment will require dedicated resources, including financial incentives for participation targeted to historically underrepresented groups. While a number of important barriers may be overcome with low-cost and low-intensity interventions, improvements that are substantial and rapid are more likely to require significant effort and resources. Improving recruitment in commercially funded trials is achieved in part by increasing the financial incentives for clinicians and patients to enroll in trials, and it will be difficult for publicly funded research to significantly enhance enrollment without creating meaningful incentives. In the context of CER, these incentives could take a wide range of forms, including benefit designs and provider payment schemes that encourage participation in CER studies. The use of such incentives could become a topic for a body of empirical research aimed at determining the impact of these incentives on enrollment of historically neglected patient subpopulations.

Critical Need 3: Improve clinical and health policy decision making.

10. Methods to evaluate the optimal and timely translation of CER results into the community and practice.

CER like any other form of research will need to be translated into practice for it to realize its goal of improving health care delivery and the health of the nation. Most of the debate about CER has assumed that the obvious translational approach will be on the basis of policy initiatives (i.e., coverage decisions). The Affordable Care Act of 2010 established the independent, nonprofit Patient-Centered Outcomes Research Institute (PCORI) to provide direction and oversight to this emerging enterprise. One challenge for PCORI will be to effectively address the concern that CER could be used to “choose a winner” in formularies and payer’s coverage decisions. CER, however, often highlights the differential features of the tests and interventions being compared, providing incremental evidence, in terms of validity, precision, and applicability, informing these features. Furthermore, CER results may identify heterogeneity of outcomes and preferences. Thus, the translational challenge is to identify how best to convey these results to those making decisions about tests and therapies.73 Methods to compare policy-based approaches (e.g., coverage decisions) with practice-based methods (e.g., shared decision-making) in translating CER into practice are also needed.

The features of alternative course of action, that is, the results of CER, would have different implications for different patient groups: those with a particular distribution of values and preferences (e.g., religious groups, patients at the end of life), those within particular contexts (e.g., institutionalized patients), and those with a particular distribution of comorbidities may consider those features differently such that desirable options for some may be undesirable for others.74 Methods that consider these how features inform quality-of-care metrics and practice guidelines for these patient groups are necessary.

We have identified the need for stakeholder input into the questions of greatest concern for CER. In addition to question generation, it will be important to develop and evaluate methods that will enable end users (those making decisions using CER) to feedback information about existing challenges for CER translation to researchers conducting original or synthetic CER research.

Finally, approaches to translate CER into practice themselves should be subject of evaluation and CER. Such approaches would include comparisons of treatment programs that include specific combinations of tests and therapies (e.g., the COURAGE trial for patients with angina,75 quality improvement methods, and disease management programs.76 The appropriate methods to conduct such CER remain embryonic and require further exploration and experience.


This article has reviewed some of the methodological challenges in going from the status quo to a new approach for developing, conducting, understanding, and using the results of research studies. The ultimate test of the value of CER is whether it improves clinical decisions and outcomes. The implication is that, to succeed, CER must become a feedback loop connecting what happens in practice to decisions regarding actions to improve outcomes, either by designing new studies to provide additional information, or by using existing information more effectively to improve care. Such feedback loops characterize processes of quality improvement and imply mature systems to measure and monitor processes.77

The next step for the CTSA program is to develop a road map to go from what methods areas need to be strengthened to how to do it. There is a real and unanswered question of how to fund and encourage methodological research while at the same time fulfilling the primary purpose of the CER initiative: conducting original research to address important questions. Historically, scientists in clinical epidemiology, biostatistics, and decision-making who can advance methods have grafted small methods projects onto their more fundable clinical projects. In many of the areas outlined above, methods research has been funded by small, separate grants to work with datasets from small clinical research studies, producing interesting but often idiosyncratic results that have not been validated in a broader range of important clinical research.

The organization of the CTSAs lends itself to a robust solution to this problem: integrating methodological research into large, prospective CER studies.78 Some of the most important innovations in study design and statistical analysis were developed in the context of large, multicenter, cooperative clinical and epidemiologic studies and trials conducted in the 1950s to the present day. A program to link methods research to the broader agenda for CER is needed to accelerate the development and refinement of better methods. Insights generated by this work should be of enormous value to the methods committee of PCORI, and to the broad range of organizations that will be funding and implementing CER.

The CTSAs should also attempt to take advantage of opportunities for natural experiments, in addition to funding explicit comparisons between methodological approaches, such as results from an observational comparative effectiveness design compared with a clinical trial. For example, using existing datasets to fund an observational study that addresses some of the same aims as a pragmatic clinical trial could provide interesting empirical evidence to further inform decisions by funders and researchers as to which circumstances warrant the additional time and expense of conducting a trial.


  • 1

    They wrote: “Lots of intellectual and emotional energy, ink, paper, and readers’ precious time have been expended comparing, contrasting, attacking, and defending randomized control trials, outcomes research, qualitative research, and related research methods. This has mostly been a waste of time and effort, and most of the disputants, by focusing on methods rather than questions, have been arguing about the wrong things.”


This project has been funded in whole or in part with Federal funds from the National Center for Research Resources, NIH, through the CTSA, a trademark of DHHS, part of the Roadmap Initiative, “Re-Engineering the Clinical Research Enterprise.” The manuscript was approved by the CTSA Consortium Publications Committee.


Appendix: Definitions and Defining Characteristics of CER

Definition from the IOM Report on Priorities for CER

“The generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.”

The IOM identified a number of key characteristics of CER, including:

  • • The objective of CER is to inform clinical or health policy decision.
  • • Compares at least two alternatives, each with potential to be best practice.
  • • Results are analyzed at population and subgroup level.
  • • Measures outcomes important to patients.
  • • Methods and data sources carefully chosen to be appropriate for the decision of interest.
  • • Conducted in real-world settings.

Definition from the FCC-CER Report

“CER is the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat and monitor health conditions in ‘real world’ settings. The purpose of this research is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances.

  • • To provide this information, CER must assess a comprehensive array of health-related outcomes for diverse patient populations and subgroups.
  • • Defined interventions compared may include medications, procedures, medical and assistive devices and technologies, diagnostic testing, behavioral change, and delivery system strategies.
  • • This research necessitates the development, expansion, and use of a variety of data sources and methods to assess comparative effectiveness and actively disseminate the results.”