Dr Suetonia Palmer, Renal Division, Brigham and Women's Hospital, Harvard Medical School, Harvard Institutes of Medicine, Room 550, 4 Blackfan Circle, Boston, MA 02115, USA. Email: email@example.com
A systematic review provides the best summary of evidence for clinical decision-making in nephrology by summarizing all the primary studies that evaluate a specific clinical question. By using rigorous and pre-specified methods, conclusions about the overall effect of an intervention can be more reliable, precise and comprehensive in a systematic review than those derived from individual studies. In this article, we describe the key components of a systematic review and meta-analysis. We summarize the features of a systematic review that should be looked for when considering the accuracy and validity of its results – particularly when applying the outcomes of a systematic review to a clinical question.
You are a nephrologist for a home haemodialysis training centre. Your patient requiring haemodialysis is in his mid-thirties and has a haemoglobin level of 80 g/L. He feels well but reports being a little tired. He has heard that erythropoietin treatment to correct his anaemia might improve his overall quality of life; he wishes to stay working while on haemodialysis and wants to know whether erythropoietin would help until he gets a kidney transplant. You are aware of potential treatment-related toxicity when prescribing erythropoietin to achieve higher haemoglobin levels in patients with chronic kidney disease (CKD). A simple search on PubMed for anaemia and chronic kidney disease retrieves 6225 citations (September 2009). You are also aware of a systematic review on the topic, which could provide a quick overview rather than dredging through the original trial literature, but how well does it summarize the trials in this area?1 Does the systematic review include all the important randomized, controlled trials? Does it miss out trials that tend to show a null effect for the intervention (bias)? Does the review take into account that differences exist between available trials (patient population, intervention, outcomes, quality) that might be important when applying the findings to your patient? Are there enough adequate data available from trials pooled by the systematic review to answer a clinical question? In the present example we ask, what is a safe haemoglobin target during erythropoietin therapy for an individual receiving haemodialysis?
In the 20 years to 2001, citations in MEDLINE alone doubled to 500 000 per year, and included 25 000 randomized, controlled trials per annum.2 Although numbers are lower in nephrology,3 there has also been an ascending trend in the number of published renal randomized, controlled trials (Fig. 1). It is obvious that synthesizing this evidence to answer clinical questions is challenging, at best. It is also evident from examples in the literature that the time from availability of new evidence to implementation into current practice can be slow (e.g. nearly 20 years for thrombolysis in acute myocardial infarction)4 possibly resulting from a collective inability to rapidly summarize and digest the evidence that is continuously being published.
Systematic reviews, using rigorous methods to identify and critically appraise all existing primary studies relating to a specific question/topic, can help clinicians identify and apply good-quality evidence to decision-making. Systematic reviews aggregate primary data from several types of studies to answer specific clinical questions. Appropriate study methods include randomized, controlled trials to answer intervention questions, observational studies for questions of aetiology and prognosis, and diagnostic test accuracy studies for diagnosis or screening. Indeed, when asking clinical questions, the systematic review is at the highest level in the hierarchy of evidence.5 In order for a systematic review to be an appropriate aggregation of the primary literature, however, specific methodology must be applied stringently; being aware of these methods allows critical appraisal of the results when applying systematic reviews to clinical care.6 In this article, we review the key items of a systematic review and the key questions a reader should consider when interpreting its results. Due to space constraints, we will focus our discussion on systematic reviews of randomized, controlled trials.
Why we need systematic reviews
• Comprehensive and unbiased summaries of the literature
• Formal mechanism for pooling data between different studies
• Formal and systematic system for exploring reasons for variable study results
WHAT IS A SYSTEMATIC REVIEW
A systematic review identifies and combines evidence from original research that fits pre-defined characteristics to answer a specific question (Table 1). Meta-analysis is a statistical method within a systematic review that summarizes the results of trial-level study data and, in some cases, individual patient data derived from existing studies (individual patient data analysis).
Table 1. Key components of a systematic review
Reproduced from The Cochrane Handbook for Systematic Reviews of Interventions22 with permission from the Cochrane Collaboration.
• A clearly stated set of objectives with pre-defined eligibility criteria for studies
• An explicit, reproducible methodology
• A systematic search that attempts to identify all studies that would meet the eligibility criteria
• An assessment of the validity of the findings of the included studies, for example, through the assessment of risk of bias
• A systematic presentation, and synthesis, of the characteristics and findings of the included studies
DOES THE REVIEW ASK A CLEAR CLINICAL QUESTION
Using the example given in the introduction – what is the safe haemoglobin level during erythropoietin therapy for an individual – we can construct a clear clinical question to decide whether a systematic review applies to our current clinical situation. The clinical question incorporates several components, that include the Population, Intervention, Comparison, Outcome and Method (PICOM). For the present clinical example, the components of the clinical question would be:
• Patient or population – individual with CKD receiving haemodialysis
• Intervention or exposure – higher target haemoglobin level achieved with erythropoietin
• Comparison – lower haemoglobin target achieved with erythropoietin
• Outcomes – either beneficial, such as quality of life, or harmful, such as mortality risk
• Method of study design for the systematic review – intervention → randomized, controlled trial
Using this predefined question, we can then locate a systematic review that is relevant to our clinical situation1– such a review should incorporate a similarly designed clinical question stated in the title, abstract or early in the text to help us quickly identify their relevance.
THE SEARCH FOR PRIMARY LITERATURE – WAS IT EXPLICIT AND COMPREHENSIVE?
For a systematic review of intervention studies, the goal is to understand the true estimate of effect of an intervention across all available randomized, controlled trials, or alternatively to recognize that trial data are inadequate, or not available to reach a conclusion about treatment efficacy and toxicity. We therefore need to be sure that the reported search strategy within a systematic review will find all potentially relevant studies and, where possible, unpublished data. When a systematic review excludes pertinent trials through incomplete searching of the literature, we cannot be confident that the summary treatment effect reported by the systematic review approaches the true effect of the intervention, particularly given that inadequate searching may omit trials with smaller or null effect sizes. Inclusion of negative trials or unpublished data to pre-existing systematic reviews has previously identified that an intervention may in fact have important adverse effects that should be considered in treatment decision-making.7 An important example is the story of selective cyclo-oxygenase-2 inhibitors, for which meta-analysis quantified the significantly increased risk of myocardial infarction associated with their use,8,9 and helped ensure their subsequent withdrawal from the market.10
In order to avoid random and systematic error (‘selection bias’), we can ask whether a systematic review has conducted a comprehensive and replicable search strategy. For systematic reviews in nephrology, searching databases such as EMBASE, CINAHL, Science Citation Index and particularly trial registries (such as the Cochrane Renal Group's specialized register and the Cochrane Central Register of Controlled Trials (CENTRAL)) may identify relevant articles that are not indexed by MEDLINE. Approximately 10% more randomized, controlled trials are identified by searching Cochrane's CENTRAL database than other databases including MEDLINE.11 This is likely due to the systematic and ongoing hand-searching of the literature carried out by the Cochrane collaboration that also includes trials published in languages other than English and trials for which results have been presented solely in conference proceedings but not as full text in a scientific journal. Excluding non-English publications, which is more common in reviews published in journals as opposed to those in the Cochrane Library, may also contribute to an incorrect estimate of treatment effect.12
HOW WERE STUDIES INCLUDED IN THE SYSTEMATIC REVIEW?
Once a search has identified potential trials, the authors of the systematic review should determine suitability of each citation for inclusion in the review. It is important that only studies matching the inclusion criteria are included in the systematic review, so that the systematic review answers a specific clinical question. Prospective criteria for study inclusion and exclusion should be explicitly stated in the review to minimize selectivity by authors. These criteria are a requirement before commencing Cochrane reviews, when a study protocol is developed, peer reviewed and published before initiating the review. The decision regarding which studies to include in a systematic review may have an important effect on a conclusion, say regarding the overall utility of a healthcare intervention.13 Therefore, study inclusion assessment should be completed independently by at least two authors and generally is arbitrated by a third. Readers of systematic reviews can look for a flow chart (usually presented as a Fig. 1) describing the details of studies identified, studies excluded, reasons for exclusion and numbers of studies included in the final review.
HOW IS THE ESTIMATE OF TREATMENT EFFECT CALCULATED WITHIN INDIVIDUAL STUDIES?
If the outcome of interest is dichotomous (the outcome is one of two possibilities – example, death or survival) the treatment effect is calculated for each trial as a risk ratio, an odds ratio or a risk difference together with the 95% confidence interval (95% CI; the range within which we are 95% confident that the effect calculated is likely to exist). While full discussion of all methods is beyond the scope of this review, dichotomous outcomes are frequently evaluated as a relative risk (RR), which deserves a brief explanation. A RR divides the event rate in the intervention group (number of events divided by the total number of individuals randomized in that group) by the event rate in the comparison group. For example, if 20 of 100 patients in the active intervention group who are randomized to erythropoietin to normalize haemoglobin levels experienced an event and 10 of 100 patients in the control group (those randomized to a lower haemoglobin target), experienced the event, then the RR is 2 (20/100 divided by 10/100), indicating that the intervention is twice more likely than the comparison treatment to result in the outcome. Interpretation of this risk for the specific patient is possible when the actual risk of the outcome for that patient without treatment is known (e.g. when RR = 2, a doubling of risk from 2% to 4% is quite different from the doubling of risk from 10% to 20% in the present example).
If the outcome of interest is a continuous variable (an example is systolic blood pressure, mmHg), then the effect size of the intervention is summarized as a mean difference (MD; and its 95% CI). The MD for the outcome in each trial is the amount by which an intervention changes the outcome on average compared with the control. In the present example of studies using erythropoietin to treat anaemia in CKD, the end of treatment mean systolic blood pressure in the higher haemoglobin target group is compared with the end of treatment value in the lower haemoglobin target group, to calculate the MD between treatment groups within each individual trial.
HOW ARE THE DATA FROM DIFFERENT STUDIES POOLED?
Once the effect of the intervention on an outcome is calculated within each trial (either the RR or MD), the next step is to combine these treatment effects for each outcome together to calculate an overall RR (dichotomous variable) or MD (continuous variable) between two treatments (meta-analysis). Combining results from individual studies is not simply achieved by treating all studies equally and averaging their data. Instead, the studies are combined using a weighted average. The contribution of a trial to the overall effect size (weight) depends on its variance (the certainty of the trial's effect size). Studies with smaller estimates of variance (greater precision) and/or with more events, make a larger contribution to the overall effect estimate of an intervention.14
HOW TO INTERPRET A FOREST PLOT
Figure 2 shows a graphical representation (known as the forest plot) commonly used in systematic reviews to summarize data from a systematic review of haemoglobin targets in patients with CKD.1 In this example, studies are pooled to examine the risk of mortality using human recombinant erythropoietin to treat anaemia (higher haemoglobin vs lower haemoglobin level) in people with CKD.1
In this forest plot:
1The left hand column shows the eight included randomized, controlled trials that have mortality data available for analysis. In this figure they are in chronological order.
2The central graphic incorporates information from the two right hand columns, Risk Ratio (95% CI) and Weight (%), for each study in the left hand column. Each trial is represented as a box (the box area relating to the study's weight) plotted against the risk ratio (also known as relative risk, RR) for that study. The 95% CI for each study's risk ratio is shown as a horizontal line. The open diamond in the lowest part of the graphic corresponds to the overall treatment effect, summarizing the pooled treatment effect of the individual trials. The width of the diamond incorporates the 95% CI of the pooled result. The risk for each study and the pooled risk are graphed on a logarithmic scale where the solid vertical line (risk ratio = 1) indicates the null hypothesis (no difference between the intervention and the comparison groups for the outcome of interest). The dotted vertical line indicates the overall pooled risk estimate, in this case plotted at a relative risk of 1.17. The direction of this risk (increased risk in higher target) is shown on the x-axis.
3The risk ratio column gives the actual risk ratio and CI for each included study comparing higher versus lower target groups. The overall pooled treatment effect corresponding to the diamond is given (RR, 1.17, 95% CI, 1.01–1.35) indicating that individuals with CKD receiving recombinant human erythropoietin targeted to a higher haemoglobin (around 130 g/L) have a 17% increased risk of dying compared with individuals receiving treatment targeted to a lower haemoglobin (of around 110 g/L). We are 95% confident that the true excess risk is likely to be between 1.01 and 1.35 (1% and 35%).
4The Weight (%) column shows how the studies contribute to the pooled estimate of effect. The studies are weighted according to the sample size and number of events – the study by Besarab et al.15 dominates the overall risk ratio due to the high number of events and large sample size (483 deaths in 1233 subjects over 29 months follow-up).
5Forest plots of dichotomous outcomes may also include a column that shows the events in the intervention and comparison groups (number of deaths/number of individuals) for each study and an overall summation (not shown in this example). This helps to interpret the overall estimate of risk (a risk increase of 17% can be better interpreted knowing the absolute risk for patients not receiving the intervention).
What happens if the meta-analysis is trying to combine apples with oranges? In other words, does the systematic review aggregate poor-quality trials that possess a substantial risk of bias, together with higher-quality trials? Such inclusion of low-quality trials may provide an unreliable conclusion about treatment efficacy or toxicity. To explore the possibility that a meta-analysis includes trials of lower quality and provides a less precise estimate of treatment effect, the reader of a systematic review might assess whether the authors have conducted a formal assessment of method quality for each included trial. Specifically, a systematic review should report an assessment of each domain considered to be indicative of study quality. These are:
1Allocation concealment (‘selection bias’): Allocation concealment is adequate when the trial investigators cannot determine the treatment group to which a patient has been assigned. Knowledge of treatment allocation may lead to exaggerated treatment effects.
2Blinding of patients, investigators and outcome adjudicators: The risk of bias is minimized when blinding is adequate to prevent additional non-randomized interventions being administered to a particular treatment group.
3Study attrition (the proportion of patients with completed follow-up): Patients excluded after allocation of treatment may be unrepresentative of the entire study population in ways that might relate to prognosis.
4Analysis conducted by the intention-to-treat principle: The primary outcomes should be assessed in all patients at pre-specified time points, according to their initial allocated treatment.
It has been shown through systematic review of meta-analyses that the estimate of effect summarized by meta-analysis may be substantially more beneficial to the intervention when the trial conduct of included studies does not follow these principles, and particularly when allocation concealment is inadequate.16,17
Inclusion of poor-quality studies in a meta-analysis may also lead to heterogeneity; that is, differences in effect estimates between studies that address the same clinical question. A statistical test of heterogeneity tells us whether such differences in treatment effects within a meta-analysis are due to study characteristics (heterogeneity), which need to be explored and explained, or are due to chance alone. The test for heterogeneity is called the Cochran's Q. This is similar to a chi-squared test for which the P-value can be interpreted (P < 0.05 indicates presence of heterogeneity). Statistical evaluation of heterogeneity is also expressed as the I2 statistic where, simply put, an I2 = 0% is no heterogeneity and increasing values to a maximum 100% is evidence of increasing heterogeneity. Higgins et al. defined low, moderate and high levels of heterogeneity as 25%, 50% and 100%, respectively.18 We note in Figure 2 that while five of eight trials appear to give similar RR for mortality comparing higher and lower haemoglobin target values, three trials (Levin et al.,19 Rossert et al.,20 and Parfrey et al.21) differ in the direction of treatment effect from the rest – and show higher risks of death with a lower haemoglobin target. The authors of this systematic review report no significant heterogeneity in this analysis (χ2 = 9.59, P = 0.213, I2 = 27%), suggesting that variability in effect size observed between studies might be due to chance alone.
Once heterogeneity is identified using formal statistical analysis, a preliminary approach to its interpretation is the visual analysis of the forest plot. Heterogeneity may be due to differences in studies including variations in the patient population, the intervention (including dose, route, frequency of administration) and study quality. In the example in Figure 2, we can ask how do the studies of Levin et al. Rossert et al. and Pafrey et al. differ from the others in the plot; did they have differing event rates; were they conducted in different populations; were they of different method quality; or were they significantly smaller or larger studies (or other similar questions).
When high-level or significant heterogeneity is identified, the causes of heterogeneity can be explored by subgroup analyses, by meta-regression or by qualitative assessment. Subgroup analysis pools similar studies together to allow the systematic reviewer to examine an effect estimate within subgroups of studies. This could be, for example, separating high-quality from low-quality studies into differing subgroups and summarizing treatment effects of each individual subgroup. It should be noted, however, that any reduction in heterogeneity achieved by dividing studies into such subgroups might simply reflect a loss of power to discern important variability that still remains between studies within a single subgroup. Overall, we should be cautious about interpreting any treatment effects that are derived from analyses that possess high levels of heterogeneity that cannot be explained. Conversely, the results of a pooled estimate, when adequately explored in terms of heterogeneity, may provide a more informative understanding of the true treatment effect than individual studies alone.
ARE THE CONCLUSIONS APPROPRIATE?
We should ensure the systematic review appropriately places the results in context. A lack of treatment effect (or evidence of significant benefit or harm) following systematic analysis of well-conducted trials is not the same as a lack of treatment efficacy when few or no trials are available to answer the clinical question. Indeed, a well-conducted systematic review identifying that few or no good-quality studies are available to answer a specific clinical question is as important as a review that contains an abundance of good-quality studies – and alerts us to the possibility that further trials are still needed to answer a clinical question. Recommendations for clinical practice derived from a systematic review should also define for which patient an intervention will affect an outcome based on the available data. For example, for our patient receiving dialysis, we might ask whether the risk of mortality with a higher haemoglobin target is different for individuals receiving dialysis compared with those patients with earlier stages of CKD. The meta-analysis by Phromminitkul et al.1 concluded that the finding of increased mortality with a higher haemoglobin targets was not influenced by stage of CKD, suggesting that the increased mortality observed with anaemia correction might be of concern to our example patient.
In conclusion (Table 2), a systematic review is the ideal study design to summarize the primary data available to answer a clinical intervention, prognostic or diagnostic accuracy question. For the patient in our introductory scenario, we have identified a systematic review that summarizes the treatment effects of increasing haemoglobin levels in people with CKD.1 Together, randomized controlled trials show a consistent and significant increase in all-cause mortality of approximately 17% when targeting a higher haemoglobin level with erythropoietin compared with a lower haemoglobin target. We can inform our patient receiving haemodialysis that correcting his anaemia may increase his mortality risk and this information should be taken into account when deciding on treatment goals for his anaemia management while he awaits renal transplantation.
Table 2. Summary Table
1 A systematic review combines evidence from original research that fits pre-defined characteristics to answer a specific question.
2 An adequate literature search reduces bias in a systematic review by including trials with smaller effect sizes or negative results.
3 Results of studies are pooled to give an overall effect estimate of treatment, given as a risk ratio for dichotomous outcomes and as a mean difference for continuous variables. Studies contribute to the pooled effect according to the precision of their individual effect sizes.
4 High level heterogeneity (differences between study effect sizes) that is not due to chance alone) should be explored. Caution is required when interpreting overall treatment effects when marked unexplained heterogeneity is present in the analyses.
We acknowledge the contribution of Gail Higgins, trial search coordinator of the Cochrane Renal Group, who provided data for the development of Figure 1.