An introduction to systematic reviews and meta-analyses in health care

Authors


Alicja R Rudnicka
E-mail address: arudnick@sgul.ac.uk

Abstract

Citation information: Rudnicka AR & Owen CG. An introduction to systematic reviews and meta-analyses in health care. Ophthalmic Physiol Opt 2012, 32, 174–183. doi: 10.1111/j.1475-1313.2012.00901.x

Abstract

Important issues in medical research are often examined in a number of studies, sometimes using different study designs. Data from such studies are often reviewed to provide a consensus in support or arguing against a particular hypothesis, such as the use of a new therapy/treatment or public health policy. However, it is important to recognise that while reviews can provide a rapid synthesis of the findings on a given issue, they may not represent a panacea of evidence. Evidence they provide is heavily dependent on the quality of studies that contribute to that review. Reviews of studies of good quality will provide stronger evidence for the research question under investigation, compared to reviews of studies with weaker methodology. Reviewers who ‘cherry pick’ studies to be included may bias findings towards a preconceived hypothesis. Hence, it is important when reviewing evidence that all studies on a given topic are identified and included where possible, i.e. the review is systematic, reproducible and representative of the totality of evidence; a so called ‘systematic review’. This article aims to familiarise those in the ophthalmic sector with methods/guidelines used to improve the quality of studies and systematic reviews. It will also outline how numerical data obtained from a systematic review can be combined using statistical methods called ‘meta-analysis’. By combining numerical estimates from different studies we can be more precise about the estimate for an effect or outcome of interest.

Introduction

Systematic reviews are being increasingly relied upon by health professionals to assimilate findings from an ever expanding medical literature. However, while systematic reviews are extremely useful in providing a rapid synthesis of evidence on a given topic, it is important to be able to recognise when a review has been carried out well and the summary of evidence can be relied upon. A systematic review or overview has been defined as ‘a review of a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant research, and to collect and analyse data from the studies that are included in the review. Statistical methods (meta-analysis) may or may not be used to analyse and summarise the results of the included studies’ (http://www.nntonline.net/ebm/newsletter/2003/10/Systematic_Reviews_and_Meta-analyses.asp). ‘Relevant research’ in this context refers to either observational or experimental studies (sometimes both), and the quality of these studies will underpin the strength of evidence they can collectively provide. Observational studies include surveys (cross-sectional studies), cohort and case-control studies, and may summarise levels of outcome or exposure in a given population or more commonly examine the association between exposures and outcome.1 An outcome is the health related state (or disease) of interest. An exposure (also called a risk factor) is something that can increase or decrease the occurrence of an outcome. Given the challenges of observational studies,2 there have been a number of attempts to improve their quality,3 and reviews of observational studies,4 in order that they can more usefully contribute towards the body of literature on a given topic. Within the field of ophthalmic epidemiology systematic reviews of observational studies have estimated the prevalence of disease, including major causes of visual loss5–7 and examined the associations between exposures and ocular outcomes.5,6,8–11 Experimental studies (intervention study/clinical trial) differ from observational studies in that they examine the effect of a purposely imposed preventative or therapeutic measure (pharmaceutical, surgical, policy related) on disease outcome. Randomised controlled trials are a type of experimental study often considered the gold standard when examining the effect of an intervention on outcome. The term ‘randomised’ refers to random assignment (randomization) and means that individuals (or communities) are allocated randomly to intervention or control groups and that allocation of subjects to a group is independent of the allocation of other subjects. This is a defining feature of the ‘randomised controlled trial’ as all factors that might be related to the outcome (so called potential ‘confounders’) are balanced between groups, except for the intervention of interest. Hence, if there is any difference in the development of outcome between intervention and control groups (i.e., between those receiving the intervention and those not), this can be attributed to the intervention itself. A well-designed experimental study can provide unbiased evidence of the efficacy and safety of an intervention. Again guidelines have been published to improve the quality of intervention studies,12 and reviews of such studies.13 Within the field of ophthalmic epidemiology, a large number of reviews of intervention studies are carried out by the Cochrane Eye and Vision Group (CEVG) (http://eyes.cochrane.org/); a review conducted by this group examining the effect of photodynamic therapy for the management of neo-vascular age related macular degeneration (AMD) is given as an example below.14

In terms of the potential importance of systematic reviews, two landmark articles published in 1992 showed that if a systematic review and meta-analysis of individual clinical trials on antithrombotic therapy after a heart attack had been undertaken sooner, the beneficial effects of this therapy (which saves lives as well as improving patients’ health) could have been introduced into clinical practice some 13 years earlier.15,16 Instead many more clinical trials were carried out than were necessary therefore wasting resources, and more importantly, patients that could have benefited from this therapy did not receive it.

It is important to recognise that a systematic review is a study in itself. Hence, a systematic review will have a protocol (developed and established in advance), which contains a number of generic steps; these are summarised in Box 1. Each step will be considered in further detail under their respective headings.

Box 1. Generic steps in a systematic review

Formulate the research question or hypothesis
Identify all relevant research on the topic
Provide exclusion and inclusion criteria for the systematic retrieval of relevant studies
Detail the method of extracting the relevant quantitative and/or qualitative information from each study or report
Assess quality of each study or report
Summarise the evidence using appropriate statistical methods
Interpret findings and present a balanced and impartial summary acknowledging the role of systematic error (bias) and random variation (chance)

Formulating the research question or hypothesis

It is important to focus the review from the outset, so that it addresses a specific question or hypothesis. If too many hypotheses are formulated it can make the review unwieldy and difficult to execute. Hence, any systematic review should have a study protocol defined a priori, which frames the purpose of the review and the questions/hypotheses it seeks to address. This requires (1) definition of the primary exposure/intervention of interest, (2) definition of the measures of disease outcome and its occurrence, and (3) details of the relevant patient groups or populations to whom the findings of the review will apply. In the example of a systematic review of intervention studies previously given14 the intervention is photodynamic therapy following verteporfin injection versus photodynamic therapy following intravenous 5% dextrose. One of the outcome measures in this review is ‘losing three lines or more in visual acuity at 12 months’, and the population to whom the findings apply are patients suffering choroidal neovascularisation due to AMD.

Identify all relevant evidence on the topic

This requires a search strategy that optimises the identification of all available studies that address the hypothesis to be examined. It is advisable to seek guidance to ensure the most efficient and comprehensive strategy is devised to maximise the attainment of all relevant articles without missing any. Electronic databases are pervasively used to search for relevant articles, with hand searches of relevant journals being used less and less. An argument against the use of electronic databases was that they did not contain historic articles, but this is no longer the case as articles dating back to the 1950s have now been added to some databases, e.g., Medline. Searches are carried out using a combination of text words and subject headings in more than one database – the strategy used is often outlined in published reviews so that in can be replicated by other investigators.5,6,14,17 Electronic searches are often validated by ensuring retrospective citations in recent reviews and prospective citations of key methodological or classic papers are identified (this can be done using the Science Citation Index). The references of full text papers should be examined to ensure that the search strategy has not missed anything. Note, if key papers are not identified the search strategy cannot be relied upon and should not be used.

There has been increasing use of data from unpublished studies in reviews in order to limit the influence of publication bias. Publication bias is where studies showing statistically significant results are preferentially accepted for publication compared to studies showing null results. The problem is that this tends to happen with small studies that just by chance show ‘statistically significant’ and untypically large effects. The funnel plot was introduced to visually examine the data for evidence of publication bias.18,19 It is a plot of the standard error against the effect/outcome size for each study and in the absence of publication bias would exhibit a symmetrical ‘V’ shape; deviations from this shape suggest that publication bias may be present (see Figure 1).18,19 The identification of unpublished studies is not straight forward, and tends to be less systematic, usually representing studies known to those working in the area. Constraining the search to articles published in English or avoiding searching the ‘grey literature’, e.g. conference proceedings/technical reports etc., may result in relevant studies being excluded. Grey literature is increasingly available electronically and inclusion of these sources of evidence is becoming easier.

Figure 1.

 Hypothetical funnel plots: (top) symmetrical plot in the absence of bias (open circles indicate smaller studies showing no beneficial effects); (middle) asymmetrical plot in the presence of publication bias (smaller studies showing no beneficial effects are missing); (bottom) asymmetrical plot in the presence of bias due to low methodological quality of smaller studies (open circles indicate small studies of inadequate quality whose results are biased towards larger beneficial effects). The vertical axis is the standard error of the logarithm of the odds ratio. The horizontal axis is the odds ratio plotted on a logarithmic scale. Odds ratios >1 indicate a higher risk of the outcome in the exposed /treated compared to unexposed/control group, whereas odds ratios <1 indicate a lower risk of the outcome in the exposed/treated compared to the unexposed/control group. Reproduced from Sterne JAC, Harbord RM. Funnel plots in meta-analysis. Stata J 2004; 2: 127–141.

Once a search strategy is finalized the date the search was undertaken needs to be noted. If reviews take a long time to complete it may be necessary to update the search to include studies that may have been published in the interim. An initial search of the titles and abstracts is undertaken (usually by two investigators independently to ensure nothing is missed) to identify articles requiring retrieval of the full paper. It is best to be inclusive at this stage to ensure all studies with relevant data are identified.

Provide exclusion and inclusion criteria for the systematic retrieval of relevant studies

Full articles should be assessed for eligibility against pre-defined inclusion criteria. Inclusion criteria usually reflect something about the study design, e.g., method used to measure or record exposure, type of intervention and whether the outcome is relevant. Studies are also assessed for methodological quality. By default, studies not meeting the inclusion criteria are excluded, although specific exclusion criteria are decided upon a priori. A flow chart showing the procedural steps and number of articles identified at each stage of the search strategy is useful to show how the final number of studies included in the review was derived. The ‘Quality of Reporting of Meta-analyses’ group have published guidelines to advise researchers, reviewers and journal editors as to what a flow diagram should include, a so called ‘QUORUM statement’.20 These guidelines are specific to reviews of experimental evidence, but can be easily adapted for reviews of observational studies.5,6

Detail the method of extracting the relevant quantitative and/or qualitative information from each study or report

Of the remaining studies meeting the inclusion criteria the reported findings are extracted onto a data extraction form/database. The following information is usually extracted for each study. This is often done by two or more researchers independently and then compared to avoid errors.

  • 1Study sample: Information on sample size, location of the study, age, gender, and ethnicity of the sample should be available and tabulated, along with any other relevant information. These data are useful as differences in these characteristics may explain heterogeneity in results across studies.
  • 2Consider the type of study design: For experimental studies, what type of design was used? Most are parallel in design where one group receives the intervention of interest, and another group receives placebo or currently best available treatment. However, be aware that other designs do exist, such as cross-over and factorial designs.1 Was allocation of patients to intervention and control groups randomised? Was allocation to intervention or control concealed (blinded) in any way, either to the observer or patient (single blinded) or both (double blind)? For observational evidence is the study cross sectional, cohort, or case-control? Other study characteristics such as methods used to define the outcome or exposure/risk factor status or method of sampling might also be relevant, as this may also potentially influence study findings.
  • 3Potential for bias: A judgment as to the extent to which bias may have compromised each study needs to be made. While this is less of a problem in well-designed experimental studies, it is a potential problem in observational studies. Bias is a systematic error that gives rise to errors in estimation of disease occurrence or in the estimation of the association between exposure and outcome. For instance, a cross sectional study, to ascertain the prevalence of AMD, in which only individuals with visual loss receive a retinal examination will underestimate the true prevalence of AMD in the population, since those with AMD visual loss represent a sub-group of those with AMD. Similarly, if the association between diet and AMD prevalence was examined in this same study, a biased estimate for the association may arise because those with visual loss and AMD may not be representative of those with AMD and no visual loss. It is important to acknowledge that a systematic review of biased studies will yield greater certainty about a biased finding. In experimental studies poor methods of randomization, blinding and unbalanced follow-up may lead to biased results. For example, blinding avoids patient bias caused by change in behaviour in response to knowledge of the treatment actually received, and observer bias caused by under/over compensation in measuring outcome in the patient, because of knowledge of the actual treatment received. It is preferable that both the observer and patient are blinded to the allocation, but this is not always feasible. Blinding in a trial to evaluate a new pharmaceutical medication in tablet form, could by design ensure that all groups receive tablets that were the same size, shape, colour and in identical packaging. Those in the intervention group would receive the medication containing the new drug, while the control group would have placebo or the best available medication currently used. The only identifier will be a code on the packaging and on the patient’s notes, which can be deciphered by the monitoring committee/study statistician. It is well established that un-blinded clinical trials tend to be associated with overoptimistic (biased) results.
  • 4Numerical data: It is noteworthy that qualitative reviews do not contain numerical data that can be analysed. Quantitative reviews include data, such as …
    • Measures of outcome occurrence e.g. prevalence or incidence of a health related/disease state.
    • Measures of effect or association, that quantify the change in the outcome measure between groups and include relative risk (e.g., rate ratio, risk ratio, odds ratio, hazard ratio) and mean difference comparisons.

Any of these measures can be used in meta-analysis as long as sufficient information is also provided to determine their variance (i.e., standard error).

Assess quality of each study or report

Methods of quality assessment should be outlined in advance and focus on the key aspects of the study, such as response rates and appropriate adjustment for confounders etc. For randomized clinical trials other metrics of study quality might include whether randomisation was carried out successfully with appropriate allocation concealment, whether follow-up rates were reasonably complete in all groups and whether single or double blinding was used. In the review by Wormald et al.14 the following information is given for one of the included studies with regards to providing clear information concerning blinding to verteporfin or dextrose 5% prior to the patient undergoing photodynamic therapy ‘….All study participants and outcome assessors, including vision examiners, photographers, ophthalmologists, Photograph Reading Center personnel and clinic monitors, were masked to the treatment assignment. The ophthalmologist responsible for applying the laser light was not masked to the fluence rate because the treating ophthalmologist was responsible for the light fluence rate being applied to the study participant’s retina. Only the study coordinators and any other person who might assist in the setup of verteporfin or placebo solutions were aware of the treatment assignment with respect to verteporfin or placebo; these individuals were trained to make every reasonable attempt to maintain masking of participating patients and all other study personnel. …..’. Hence the investigators made efforts to blind patients and observers to the treatment allocation.

Summarise the evidence and perform appropriate statistical analysis

The purpose of a meta-analysis is to pool numerical evidence from individual studies to obtain a more precise overall estimate by improving statistical power. The first step is to extract the available data from each study in a summary table (again this is often done by two or more researchers independently to avoid errors). The summary table contains details of the citation, key aspects of the study design, short description of the data extracted including the outcome of interest and its corresponding variance (standard error). These measures are then combined using the appropriate statistical methods.

In the example below we focus on the interpretation of the statistical analysis section from the systematic review of intervention studies examining the effect of photodynamic therapy following verteporfin injection (PDT group) versus photodynamic therapy following intravenous 5% dextrose in water (placebo/control group) for the management of neo-vascular age related macular degeneration.14Figure 2 is taken from this review and shows a typical graphical presentation of the data used. On the far left of the Figure four studies are listed that contributed to this particular meta-analysis, along with the year they were published; TAP, VIM, VIP and VIO are the study name acronyms. For each study it gives the number of ‘events’ and the total number of participants in each group. For instance, in the TAP study 156 out of 402 participants who received photo dynamic therapy with verteporfin (PDT) had the outcome of interest (i.e., loss of three lines or more in visual acuity at 12 months). Hence, the risk (expressed as a fraction) of losing three lines or more in visual acuity at 12 months in the PDT group is 156/402. In the placebo group the risk of losing three lines or more is 111/207.

Figure 2.

 Forest plot of comparison of photodynamic therapy with verteporfin versus placebo. The outcome is: Loss of three or more lines (15 or more letters) visual acuity at 12 months. This figure has been reproduced from Wormald et al.14 (with permission from John Wiley & Sons Ltd).

The next column labelled ‘weight’ gives the statistical weight each study contributes to the meta-analysis. The TAP study is the largest study and carries the most weight and contributes 38.9% to the meta-analysis, whereas the VIM study is the smallest and contributes considerably less statistical weight, only 5.4%. The amount of weight each study contributes is determined by the type of meta-analysis undertaken (see below).

The effect of the PDT versus placebo is summarized for each study in the column labelled ‘Risk Ratio’. The risk ratio is the risk of visual loss in the PDT group divided by the risk of visual loss in the placebo group. For the TAP study this is 156/402÷111/207 = 0.72. For each risk ratio the 95% confidence interval (95% CI) is given, which is a measure of how precise the risk ratio estimate is from each study. A 95% confidence interval is calculated from the estimate of the outcome of interest ± (1.96 × its standard error). Informally, the 95% confidence interval is interpreted as a range of values within which we are 95% certain the true value of the outcome of interest lies. The ‘true value’ refers to the value that we would expect to see in the whole population to whom the results are applicable to. For the TAP study the 95% CI is from 0.61 to 0.86, this can be interpreted as follows ‘we are 95% sure that the true risk ratio in the population of similar patients lies between 0.61 and 0.86’. As we can see there is some variability in the range of values encompassed by the 95% CI across the four studies.

Types of meta-analysis

We can combine the risks ratios and 95% CIs into a single result by a statistical process called meta-analysis. Broadly there are two methods of pooling estimates in a meta-analysis, a fixed effect or random effects model. A fixed effect meta-analysis operates under the assumption that all the individual studies are estimates of the same underlying effect. In a fixed effect meta-analysis the numerical results from each study are averaged by weighting each study according to some attribute reflecting study size or precision. Usually this is by weighting each study by the inverse of its variance21– a precise study will have a small standard error, therefore a narrow confidence interval and will carry more weight in a meta-analysis than an imprecise study that has a large standard error and consequently a wide confidence interval. A random effects meta-analysis assumes that the individual study results come from a ‘distribution of effects’ and therefore takes into account the between-study variation in estimates as well as within-study variance.22 Hence, each study is weighted by a combination of between-study variance and within-study variance. Consequently in a random effects meta-analysis, smaller (and usually less precise) studies have relatively more weight than in a corresponding fixed effect meta-analysis. If the results from the individual studies are reasonably homogenous then a fixed effect meta-analysis is undertaken. However, if there is heterogeneity across the study results then usually a random effects meta-analysis is undertaken and reasons for heterogeneity explored.13

When data from all four studies are combined by meta-analysis the pooled risk ratio is 0.80 and the corresponding 95% CI is from 0.69 to 0.93 as indicated by the row labelled ‘Total (95% CI)’.

This result can be interpreted as follows ‘the risk of losing three lines or more in visual acuity at 12 months in the PDT group is 0.8× the risk in the placebo group, which is equivalent to a reduction in risk of 20%. The 95% CI tells us that we can be 95% sure that the true risk ratio in the population of similar patients lies between 0.69 and 0.93, so potentially a reduction in the risk of visual loss of at most 31% or at least 7%’.

Finally on the far right is the forest plot, so called because a typical plot appears as a ‘forest of lines’.23 It is a visual presentation of the numerical data already described. The horizontal axis represents the risk ratio scale. Some points worth noting about the risk ratio scale:-

  • 1The vertical line at risk ratio of 1.0 is the line of ‘no effect’ meaning that the risk of visual loss in the PDT group is the same as the risk of visual loss in placebo group i.e. (risk PDT) ÷ (risk placebo) = 1. This is also sometimes referred to as the null effect or null line.
  • 2‘Favours PDT’ refers to risk ratios less than one, i.e., (risk PDT)÷(risk placebo) <1 meaning that the risk of visual loss is lower in PDT group than in placebo group.
  • 3‘Favours placebo’ refers to a risk ratio greater than one, i.e., (risk PDT) ÷ (risk placebo) >1 meaning that the risk of visual loss is higher in PDT group than in placebo group.
  • 4Risk ratios are usually plotted on a logarithmic rather than arithmetic scale as is the case in this example.

Each study specific risk ratio is plotted as a solid square symbol. This size of the square symbol is proportional to the weight of each study. The horizontal line going through each square represents the 95% CI. The VIM study has the smallest square symbol and so carries the least weight in the meta-analysis; it has the widest 95% CI line meaning it is the least precise study.

It is worth noting that all of the square symbols are to the left hand side of the line of ‘no effect’ as the risk ratios for each study are all <1.0. This suggests a lower risk of visual loss with PDT compared with placebo. However, the 95% CIs lines for three of the studies straddle the line of ‘no effect’, this is because the upper end of the 95% CI for the risk ratio is >1.0. Hence, for three out of the four studies, the 95% CI includes the possibility of no benefit or potentially higher risk of visual loss with PDT compared with placebo. Hence, on their own these three studies are inconclusive as to the true effect of PDT versus placebo. The TAP study does show stronger evidence of a beneficial effect of PDT over placebo because the risk ratio is below 1.0 and the 95% CI does not include 1.0, i.e., the upper end of the confidence interval is 0.86 which is <1.0. The combined meta-analytic estimated is illustrated on the forest plot by the solid diamond, the centre of which is at 0.8 and the tips of the diamond represent the 95% CI limits, at 0.69 and 0.93.

The degree of heterogeneity between studies is estimated by calculating a chi square statistic. In Figure 2‘Tau’ is the between-study variance and is usually difficult to interpret so it is usual to focus on the p-value for the chi square statistic for the between-study variance. In this example χ2 = 4.26 and p = 0.23, meaning there is no statistically significant heterogeneity in this meta-analysis so a fixed effect could be used. The Authors chose a priori to present their results in a more conservative light by using a random effects meta-analysis hence the heading ‘Random’ under the ‘Risk Ratio’ heading. However, in the absence of statistically significant heterogeneity a fixed effect and random effects meta-analysis will give very similar results, although the latter usually gives slightly wider confidence intervals.

A value ‘I2 = 30%’ is presented; the I2 statistic is another measure of heterogeneity, and gives the proportion of variability between studies that is due to heterogeneity rather than chance alone. In this example it means that 30% of the variation between studies is due to true heterogeneity rather than chance variation between studies.

The final result, the ‘Test for overall effect’, gives a Z statistic and corresponding p-value for testing the pooled effect against the null hypothesis. In this example the ‘null hypothesis’ is that the risk of losing three lines of visual acuity at 12 months in PDT group is the same as that in the placebo group, i.e., that the null hypothesis risk ratio is 1.0. The p value is very small (0.004) and means that if the null hypothesis were true the probability of observing the pooled risk ratio of 0.80 is 4 in 1000. This is viewed as unlikely and we conclude that the null hypothesis is not true and that in the population of similar patients the risk of losing three lines of visual acuity at 12 months is lower in the PDT group compared to placebo.

Interpret findings and present a balanced and impartial summary acknowledging the role of systematic error (bias) and random variation (chance)

When interpreting the findings from any review it is important to consider whether there is any potential for bias in the individual studies, and if so, whether this is in a sub-set of studies that can be excluded or in the studies as whole. As detailed above the strength of the overall estimate and width of its corresponding confidence interval and whether it bridges, or the degree to which it excludes, the null value needs to be considered. In the example above, the conclusion given by the Authors was that ‘….Photodynamic therapy in people with choroidal neovascularisation due to AMD is effective in preventing clinically significant visual loss with a relative risk reduction of approximately 20%.’14 Other outcomes were considered by this systematic review and for the interested reader further details are available from the article.14 The p-value associated with the lower risk of visual loss amongst those receiving PDT compared to placebo is small (p = 0.004), hence, this finding is unlikely to have occurred by chance and both ends of the 95% CI are consistent with a beneficial effect. Thus random error is not a major concern. Also the authors excluded studies with poor/unclear methodology with regard to randomisation, blinding and patient follow-up and therefore the findings are unlikely to suffer from bias.

Systematic reviews of observational studies

So far we have described the findings from a systematic review of intervention studies, but as we have previously noted, systematic reviews can also pool together evidence from observational studies. However, in observational studies this is often more complex as the degree of, or potential for, heterogeneity across studies is usually more and requires more sophisticated statistical approaches to explain or allow for factors related to heterogeneity.5,6,9 We recently completed a systematic review of the prevalence of late age-related macular degeneration (AMD) in populations of white European ancestry6 that showed a large degree of variation in prevalence estimates across individual studies. A forest plot of the study specific estimates of overall prevalence is reproduced in Figure 3 and shows that the variation between studies is considerable. The chi square statistic for heterogeneity was highly statistically significant (p < 0.001) and I2 was 97%, meaning that 97% of the variation between studies was due to true variation rather than chance alone. Therefore a random effects pooled estimate of prevalence is reported in Figure 3 of 3.2% (95% CI 2.6–3.7%). We found that the age profile and methods used to define AMD within each study contributed to a large proportion of the heterogeneity between studies. We performed a type of meta-analysis (called meta-regression) that took into account factors related to study design and age distribution of participants in each study. This approach allowed prevalence estimates to be obtained for each year of age, standardized to the studies employing the most rigorous methods of defining cases of AMD, i.e., a recognized international classification for AMD with fundus imaging. This allowed for 70% of the variation between studies, giving more reliable estimates of prevalence. Table 1 from this paper shows the age-specific prevalence of late AMD by subtype and gives Bayesian 95% Credible Intervals but the interpretation as described above for 95% CI still applies. For example, at 80 years of age the prevalence of late AMD is 10.9% and we can be 95% sure that the true population value for prevalence at this age is between 7.7% and 14.2%; these findings are applicable to populations of white European ancestry.

Figure 3.

 Prevalence estimates of late age-related macular degeneration in populations of white European ancestry, in ascending order of mean age at outcome. Closed diamonds show the prevalence of late AMD for each study, horizontal lines 95% CI (study by Gibson truncated due to exceptionally high prevalence). The pooled estimate, using a random effects meta-analysis, is shown by a dashed vertical line and open diamond (95% CI). I2 test for heterogeneity 97% (p < 0.0001). This figure is reproduced from Rudnicka et al.6 (with permission from Elsevier).

Table 1.   Estimated prevalence according to age and age related macular degeneration type in populations of European ancestry6
Age (years)Predicted prevalence % (95% CrIa)
Late AMDGANVAMD
  1. AMD, age related macular degeneration, GA, geographic atrophy, NVAMD, neovascular age related macular degeneration.

  2. Prevalence estimates are based on using International Classification or Wisconsin Age-Related Maculopathy Grading system together with fundus photography/imaging.

  3. aBayesian 95% credible interval.

500.08 (0.05–0.12)0.04 (0.02, 0.07)0.04 (0.02, 0.07)
550.16 (0.11–0.24)0.08 (0.05, 0.13)0.08 (0.05, 0.13)
600.33 (0.22–0.48)0.16 (0.10, 0.26)0.17 (0.11, 0.25)
650.67 (0.46–0.96)0.34 (0.22, 0.51)0.34 (0.23, 0.49)
701.38 (0.95–1.95)0.70 (0.46, 1.02)0.70 (0.48, 0.97)
752.80 (1.95–3.91)1.43 (0.98, 2.05)1.40 (0.99, 1.90)
805.60 (3.92–7.73)2.91 (2.00, 4.14)2.79 (1.99, 3.79)
8510.88 (7.70–14.81)5.81 (4.00, 8.29)5.48 (3.91, 7.45)
9020.10 (14.52–26.60)11.29 (7.75, 16.10)10.49 (7.45, 14.37)

Summary

Systematic reviews and meta-analysis can be extremely useful in bringing together evidence in a clear and unbiased way if all the component studies are well designed and executed. Compared to a primary research study a systematic review is usually less time consuming to complete but not all systematic reviews are rigorous and unbiased. It may be difficult to obtain the data or information needed directly from the published domain, and although it is possible to seek further information or clarification from the study investigators of primary studies, this may not be possible in all instances. Similarly, reviews based on individual patent data may be biased if they represent a subset of the relevant studies identified, which are atypical of the remainder. It is therefore important to critically appraise a systematic review.20 Each stage of the review process needs to be clear and transparent and any deviations from the original protocol justified. In this article we have highlighted the rigor with which systematic reviews and meta-analysis should be conducted. The interpretation of pooled estimates should not rely on statistical significance alone. The 95% CI associated with any pooled estimate should be interpreted since the limits of this interval provide important information about the potential clinical implications of the findings. In the example of photodynamic therapy for neovascular AMD the pooled estimated suggested a 20% reduction in the risk of visual loss, but the corresponding 95% CI was compatible with up to 31% or at least 7% risk reduction. This is quite a wide range of potential therapeutic benefit of PDT over placebo, but both ends of the CI are associated with benefit. Difficulties arise when the two ends of the 95% CI have different clinical implications i.e., one end of the 95% CI suggests benefit but the other end of the 95% CI suggests harm (as was observed for some of the individual trials in the PDT review). This is usually an indication that there is insufficient evidence to conclude one way or the other and more evidence (i.e., more primary studies) are needed. With the increasing focus on generating guidelines for best practice and for all healthcare professionals to keep abreast of current best knowledge, there is increasing need for rigorous systematic reviews and meta-analyses. Any recommendations resulting from systematic reviews should consider the strengths and weaknesses of the evidence included. A common problem in undertaking a systematic review is the lack of well-designed primary studies to include. However, highlighting the need for further primary studies in a given area is equally as important. Systematic reviews and meta-analysis are powerful tools that should replace traditional reviews so that the conclusions are balanced, unbiased and based on the best available evidence.

Conflict of interest

No authors have any financial/conflicting interests to disclose.

Appendix

inline image

Dr Alicja Rudnicka qualified as an optometrist in 1987 and completed a PhD in 1994 (City University, London and Moorfields Eye Hospital, London). She undertook an MSc in Medical Statistics at the London School of Hygiene and Tropical Medicine and is currently Senior Lecturer in Medical Statistics at St George’s, University of London. She is responsible for curriculum development of epidemiology and evidence based medicine on the undergraduate MBBS and postgraduate MSc courses. Her research focuses on cardiovascular epidemiology, objective measures of physical activity and ophthalmic epidemiology as well as systematic reviews and meta-analyses.

inline image

Dr Christopher Owen trained as an optometrist at City University. He completed an MSc in Epidemiology at the London School of Hygiene and Tropical Medicine, and is currently Senior Lecturer in Epidemiology. He is responsible for the conception, development, conduct, analysis and writing up of epidemiological research and for planning and providing teaching in epidemiology and public health to undergraduate medical students and to postgraduate students from a wide range of backgrounds. He has carried out a number of systematic reviews in the field of life course and ophthalmic epidemiology, which have been published in peer-reviewed journals.

Ancillary