Evidence from randomized controlled trials (RCTs) often attempts to determine the efficacy of a treatment or intervention under ideal conditions; although many randomized trials also investigate the effectiveness of an intervention in pragmatic studies, it might be more common for results from observational studies to be used to measure the effectiveness of an intervention by assessing the effects in 'real world' scenarios. The Institute of Medicine defines comparative effectiveness research (CER) as: “the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels" (Institute of Medicine 2009). CER has also been called "comparative clinical effectiveness research" and "patient centered outcomes research" (Kamerow 2011). Regardless of what this type of research is called, it should give an unbiased estimate of whether one treatment is more effective or safer than another for a particular population.
Numerous study designs and modifications of existing designs, both randomized and observational, are being used for CER. These include, but are not limited to, head to head randomized trials, cluster randomized trials, adaptive designs, practice/pragmatic trials, “practice based evidence for clinical practice improvement” (PBE-CPI), natural experiments, observational or cross-sectional studies of registries, and databases including electronic medical records, meta-analysis, network meta-analysis, modelling and simulation. Modifications can often include newer observational study analysis approaches employing so-called causal inference techniques, which can include instrumental variables, marginal structural models, and propensity scores, among others. As noted in the Cochrane Handbook for Systematic Reviews of Interventions, potential biases for non-randomized studies are likely to be greater than for randomized trials (Higgins 2008). A systematic analysis of study design features, risk of bias, parameter interpretation, and effect size for all the types of studies used for CER is needed to identify specific differences in design types and potential biases.
This review summarizes the results of methodological studies that compare the outcomes of observational studies with randomized trials addressing the same question, as well as methodological studies that compare the outcomes of different types of observational studies. Debate about the validity of observational studies versus randomized trials for estimating effectiveness of interventions has gone on for decades. A number of reviews comparing the effect sizes and/or biases in RCTs and observational studies have been conducted (Benson 2000; Britton 1998; Concato 2000; Deeks 2003; Ioannidis 2001; Kunz 1998; Kunz 2002; MacLehose 2000; Odgaard-Jensen 2011; Oliver 2010; Sacks 1982; Wilson 2001).These reviews examine whether certain types of study designs over or underestimate treatment effects, or change the direction of effects. Some reviews found that a lack of randomization or inadequate randomization is associated with selection bias, larger treatment effects, smaller treatment effects, or reversed direction of treatment effects (Deeks 2003; Ioannidis 2001; Kunz 1998; Odgaard-Jensen 2011), while others found little to no difference in treatment effect sizes between study designs (Benson 2000; Britton 1998; Concato 2000; MacLehose 2000; Oliver 2010). However, there has been no systematic review of comparisons of all study designs currently being used for comparative effectiveness research. Reviews that compared RCTs to observational studies most often limited the comparison to cohort studies, or the types of observational designs included were not specified. In addition, most of the reviews were published between 1982 and 2003 and the methodology for observational studies has evolved since that time. One review, first published in 2002 (Kunz 2002), has been archived and superseded by later versions. The most recent version of that review, published in 2011, compared random allocation versus non-random allocation or adequate versus inadequate/unclear concealment of allocation in randomized trials (Odgaard-Jensen 2011). This review included comparisons of randomized trials (’randomised controlled trials’ or ’RCTs’); non-randomized trials with concurrent controls; and non-equivalent control group designs. This review excluded comparisons of studies using historical controls (patients treated earlier than those who received the intervention that is being evaluated, frequently called ’historically controlled trials’ or ’HCTs’); and classical observational studies, including cohort studies, cross-sectional studies, case-control studies and ’outcomes studies’ (evaluations using large administrative or clinical databases). Another recent review assessing the relationship between randomized study designs and estimates of effect has focused only on policy interventions (Oliver 2010).
Why it is important to do this review
Despite the need for rigorous comparative effectiveness research, there has been no systematic comparison of effect measure estimates among all the types of study designs used for comparative effectiveness research. The findings of this review will inform the design of future comparative effectiveness research and help prioritize the types of context-specific study designs that should be used to minimize bias.
To assess the impact of study design--to include RCTs versus observational study designs, different types of observational studies (e.g. cohort vs case-controls), and/or choice of analytic techniques (e.g. logistic regression OR vs OR as estimated from a marginal structural model) on the effect measures estimated in observational and randomized studies.
To explore methodological variables that might explain any differences identified. Effect size estimates may be related to the underlying risk of bias (i.e., methodological variables) of the studies, and not the design per se. A flawed RCT may have larger effect estimates than a rigorous cohort study, for example. If the methodological studies we include assess the risk of bias of the study designs they include, we can attempt to see if the differences in risk of bias explain any differences in effect size estimates.
To identify gaps in the existing research comparing study designs.
Criteria for considering studies for this review
Types of studies
We will examine systematic and non-systematic reviews that are designed as methodological studies to compare quantitative effect size estimates measuring efficacy or effectiveness of interventions of trials with observational studies or different designs of observational studies. Comparisons will include randomized controlled trials and observational studies (potentially including, but not limited to, retrospective cohorts, prospective cohorts, case controls, and cross-sectional designs) that compare effect measures from different study designs or analyses. For the purposes of this review, we will use non-experimental studies and observational studies interchangeably, because the only non-experimental studies we will be analyzing will be observational in design. However, it should be noted that the terminology describing study designs is not consistent and can lead to confusion. We will provide a glossary defining the types of study designs that are identified for inclusion in this review.
We will include methodological studies comparing head-to-head randomized trials, cluster randomized trials, adaptive designs, practice/pragmatic trials, PBE-CPI, natural experiments, prospective and retrospective cohort studies, case control studies, observational or cross-sectional studies of registries and databases including electronic medical records, or observational studies employing so-called causal inference techniques (e.g. briefly, analytical techniques that attempt to estimate a true causal relationship from observational data), which can include instrumental variables, marginal structural models, and propensity scores. Specifically, we intend to include comparisons of estimates from RCTs with any of the above types of observational study, or comparisons of different types of observational studies.
Our focus is on studies of effectiveness of interventions. We will exclude comparisons of study designs for studies that are measuring only harms or diagnostic tests, as well as studies measuring risk factors or exposures to potential hazards. We will exclude studies that compare randomized trials to non-randomized trials. For example, we will exclude studies that compare studies with random allocation to those with non-random allocation or trials with adequate versus inadequate/unclear concealment of allocation. We will also exclude studies that compare the results of meta-analyses with the results of trials or observational studies. We will exclude meta-analyses of the effects of an intervention that include both randomised trials and observational studies with an incidental comparison of the results.
Types of data
It is our intention to select studies that quantitatively compare the efficacy or effectiveness of alternative interventions to prevent or treat a clinical condition or to improve the delivery of care. Specifically, our study sample will include studies that have effect estimates from RCTs or cluster randomized trials and observational studies, which will include but not be limited to cohort studies, case control studies, and cross-sectional studies.
Types of methods
We expect studies comparing effect measures between trials and observational studies or different types of observational studies to include the following.
- RCTs/cluster randomized trials versus prospective/retrospective cohorts
- RCTs/cluster randomized trials versus case control studies
- RCTs/cluster randomized trials versus cross sectional studies
- RCTs/cluster randomized trials versus other observational design
- Any type of observational design versus a different type of observational design
Types of outcome measures
We will describe the direction and magnitude of effect estimates (e.g. odds ratios, risk ratios, risk difference).
Search methods for identification of studies
See the Cochrane Library's section on Cochrane Review Groups for a description of search methods used by the Cochrane Methodology Review Group.
To identify relevant methodological studies we will search the following electronic databases, in the period from 01 January 1990 to the search date:
- Cochrane Methodology Register
- Cochrane Database of Systematic Reviews
- MEDLINE (via PubMed)
- EMBASE (via EMBASE.com)
- Literatura Latinoamericana y del Caribe en Ciencias de la Salud (LILACS)
- Web of Science/Web of Social Science
Along with MeSH terms and a wide range of relevant keywords, we will use the sensitivity-specificity balanced version of a validated strategy to identify studies in PubMed (Montori 2004), augmented with one term ("review" in article titles) so that it will better target narrative reviews. We anticipate that this strategy will retrieve all relevant narrative reviews. See Appendix 1 for our draft PubMed search strategy, which we will modify as appropriate for use in the other databases.
The search strategy will be iterative, in that we will search references of included studies for additional references. We will use the "similar articles" and "citing articles" features of several of the databases to identify additional relevant articles. We will include all languages.
Prior to executing the electronic searches, the search strategy will be peer reviewed by a second information specialist, according to the Peer Review of Electronic Search Strategies (PRESS) guidance (Sampson 2009).
Data collection and analysis
We will base the methodology for data collection and analysis on the guidance of the Cochrane Handbook (Higgins 2008). Two authors (AA and LB), working independently, will examine abstracts of all studies identified by electronic or bibliographic scanning. Where necessary, we will obtain the full text to determine the eligibility of studies for inclusion.
Selection of studies
After removing duplicate references, one author (TH) will make the first broad cut of these results, excluding those that were clearly irrelevant (e.g. animal studies, editorials, case studies).
Two authors (AA and LB) will then independently select potentially relevant studies by scanning the titles, abstracts, and descriptor terms of the remaining references and apply the inclusion criteria. We will discard irrelevant reports, and obtain the full article or abstract for all potentially relevant or uncertain reports. The two authors will independently apply the inclusion criteria. We will review studies for relevance, based on study design, types of methods employed, and a comparison of effects based on different methodologies or designs. TH will adjudicate any disagreements that cannot be resolved by discussion.
Data extraction and management
After an initial search and article screening, two authors will independently double-code and enter information from each selected study onto standardized data extraction forms. Extracted information will include the following
- Study details: citation, start and end dates, location, study design (systematic review or other), eligibility criteria, (inclusion and exclusion), study designs compared within study, interventions compared.
- Comparison of methods details: effect estimates from each study design within each publication.
- Outcome details: primary outcomes identified in each study.
Assessment of risk of bias in included studies
We are including systematic and non-systematic reviews of studies; therefore, the Cochrane Collaboration tool for assessing the risk of bias for individual studies does not apply. We will use the following criteria to appraise the risk of bias of included studies, which are similar to those used in the methodology review by Odgaard-Jensen and colleagues (Odgaard-Jensen 2011).
- Were explicit criteria used to select the studies?
- Did two or more investigators agree regarding the selection of studies?
- Was there a consecutive or complete sample of studies?
- Was the risk of bias of the included studies assessed?
- Did the review control for methodological differences of included studies (for example, with a sensitivity analysis)?
- Did the review control for heterogeneity in the participants and interventions in the included studies?
- Were similar outcome measures used in the included studies?
- Is there a risk of selective reporting?
- Is there evidence of bias from other sources?
We will rate each criterion as adequate, inadequate, or unclear.
We will summarize the overall risk of bias of each study as: low risk of bias, unclear risk of bias, or high risk of bias.
Measures of the effect of the methods
In general, we anticipate outcome measures to include, but not be limited to, risk ratios or rate ratios, odds ratios, hazard ratios (HR), risk differences (RD).
Dealing with missing data
This review is a secondary data analysis and will likely not incur missing data issues seen in most systematic reviews. However, if we encounter a scenario where we need more information from the publishing authors regarding methods or otherwise, we will contact the corresponding authors.
Assessment of heterogeneity
Though unlikely, if we decide we are able to synthesize data from multiple studies, we will examine heterogeneity among all studies using the χ
Assessment of reporting biases
We will attempt to minimise the potential for publication bias by our comprehensive search strategy that will include evaluating published and unpublished literature.
It is our intention to examine the relationship between study design type and the affiliated estimates. Using results from RCTs as the reference group or cohort studies as the reference group for comparisons among observational designs, we intend to examine the published estimates to see whether there is a relative under- or over-estimation of the risk ratio--calculate whether the comparators show about the same effects, larger treatment effects, smaller treatment effects, or reversed direction of treatment effects compared to the reference group. Furthermore, we will qualitatively describe the reported results from studies for each comparison in tables. Using methods described by Altman 2003, we will estimate the ratios of risk ratios (RRR) if we have multiple studies reporting similar data. It is likely that our results will vary considerably by comparison groups, outcomes, interventions, and study design--which will ultimately contribute greatly to heterogeneity and prevent us from pooling results in a meta-analysis.
We will provide a glossary defining the types of study designs that are identified for inclusion in this review.
Subgroup analysis and investigation of heterogeneity
Reducing bias in comparative effectiveness research is particularly important for studies comparing pharmacological interventions with their implications for clinical care and healthcare purchasing. Since we anticipate that a number of the studies comparing study designs used for comparative effectiveness research may focus on drug comparisons, we plan to conduct an a priori subgroup analysis of these drug studies. Specifically, we hypothesize that studies of drug comparisons in a randomized design may have smaller effect estimates than studies of drug comparisons in a non-randomized design.
Furthermore, we will explore the effect of topic area on the differences of direction and magnitude of effect between study designs or analytical approaches. Specifically, for each type of comparison we will compare RRRs, but we expect that the topic area can affect the effect estimate in an unknown direction. We will compare RRRs between topic areas in an attempt to look at the differences of effect within topic areas. Additionally, we aim to perform a subgroup analysis by heterogeneity of the included methodological studies--namely, we want to compare the differences between RCTs and observational studies from the subgroup of methodological studies with high heterogeneity (as measured in their respective meta-analysis) to those with moderate-low heterogeneity. Finally, we intend to explore the possibility of confounding by indication by performing subgroup analyses that highlight differences in interventions and conditions. Specifically, we will subgroup studies by the same intervention (yes/no) and subgroup studies by same conditions (yes/no).
In the unlikely possibility that pooled results are performed, the results will likely be heterogeneous, and we will conduct sensitivity analyses to identify studies with outlying results for further examination. We will identify outlying studies by the presence of an obvious source of heterogeneity or bias. We will estimate the results with and without the outlying studies. One potential, obvious source of heterogeneity may come from pooled estimates from case control studies. While case controls are ideal for studies of rare diseases, they are not necessarily ideal for exploring short-term intervention effects. In the event we identify studies that report results from case controls compared to results from RCTs, we will perform a sensitivity analysis excluding the case control studies to see how their inclusion would affect the summary estimates.
Appendix 1. Proposed PubMed strategy, which will be modified as appropriate for use in the other databases
Protocol first published: Issue 2, 2012
Contributions of authors
All authors contributed to drafting this protocol.
Declarations of interest
Sources of support
- Clinical and Translational Sciences Institute (CTSI), University of California, San Francisco (UCSF), USA.
- No sources of support supplied