Meta-analysis: A need for well-defined usage in ecology and conservation biology

Meta-analysis is a powerful research summarization technique. In the medical field, for example, meta-analysis is an indispensable tool as part of systematic reviews for healthcare decision making. The advantages of meta-analysis have also been recognized in the fields of ecology and conservation biology with the method becoming increasingly popular since the 1990s. ‘‘Meta-analysis’’, however, is not well-defined in these fields, but is regularly confused with other summary analysis techniques, such as multiple regression methods, vote-counting or other quantitative analyses. We argue that this vague and inconsistent utilization of the term is problematic, because a meta-analysis typically provides scientifically rigorous results. We therefore advocate a consistent and well-defined usage of the term in our disciplines, based on the standardized definition applied in the medical sciences. We searched the Web of Knowledge for meta-analyses in the subject area ‘‘biodiversity conservation’’ and evaluated the usage of the term ‘‘meta-analysis’’. Based on meta-analysis literature from the medical sciences, we determined steps that in our opinion are mandatory when performing meta-analysis and rated articles according to these steps. In the first round of rating, we assessed the usage of four ‘‘technical’’ steps that are normally applied in meta-analytical software. In the second round, we only evaluated the highly rated articles from the first round. We considered three steps regarding more qualitative aspects of interpretation and results presentation. Of the 133 evaluated articles in the first round, only 45% fulfilled all technical requirements for a meta-analysis, while 25% did not fulfill any of the requisite steps. In the second round, only one article of 83 fulfilled all requisite steps, while 22% did not fulfill any requirement. Our findings highlight the ambiguous and vague usage of the term ‘‘meta-analysis’’ in ecology and conservation biology and underline the importance of a consistent and clear definition. We conclude with recommendations on how the term should be applied in the future.


INTRODUCTION
Meta-analysis is a statistical method that summarizes the results from at least two different studies (Higgins and Green 2011, Chapter 9).
The benefits of meta-analysis are higher statistical power and better precision, as well as the ability to address a broader scope than the combined primary studies (Higgins and Green 2011, Chapter 9), making meta-analysis a powv www.esajournals.orgerful statistical method for summarizing research findings across studies.Meta-analysis was first developed and applied in psychology (Glass 1976) and the medical sciences (Borenstein et al. 2009, xxiii ), and has since become widely used in these fields (Sutton and Higgins 2008).In the medical sciences, for example, meta-analysis is part of systematic reviews, an indispensable tool for ensuring effective medical treatment.By summarizing and analyzing primary research in systematic reviews and meta-analyses, standard guidelines can be developed that directly benefit patient care.The Cochrane Collaboration, a well known and internationally recognized expert network, is dedicated to providing high quality research evidence for healthcare decision making (The Cochrane Collaboration 2012).In the fields of ecology and conservation biology, the advantages of meta-analysis and systematic review are also being recognized (Gurevitch andHedges 2001, Cadotte et al. 2012).Beginning in the 1990s, the benefits of meta-analysis in ecological research were already being expressed (Fernandez-Duque and Valeggia 1994, Arnqvist and Wooster 1995, Gurevitch and Hedges 1999), with the Centre for Evidence-Based Conservation later taking example of the Cochrane Collaboration and promoting systematic reviews in the field of conservation and environmental management (Centre for Evidence-Based Conservation 2012).
Meta-analysis is sometimes confused with systematic reviews (Nakagawa andPoulin 2012, The Cochrane Collaboration 2012), but in fact meta-analysis, as a statistical summary technique, is part of a systematic review (Fig. 1).A systematic review, in contrast to a traditional narrative literature review, requires a clearly formulated research question, an extensive literature search that ideally includes relevant unpublished research findings, transparent study inclusion and exclusion criteria, a quantitative synthesis of the data (normally by a metaanalysis), and interpretation of the results (Borenstein et al. 2009: xxiii, Centre for Evidence-Based Conservation 2012).A systematic review differs substantially from a narrative review in its transparency and replicability.
We are raising the issue of the vague and inconsistent use of the term ''meta-analysis'' (see also C ôte ´and Reynolds 2012).The term is frequently confused with other summary analy-sis techniques, e.g., multiple regression methods, correlational studies, vote-counting or other quantitative analyses.''Meta-analysis'' is improperly and unknowingly used for a whole range of summary techniques, either by the authors themselves (see below) or by others who deem an article a ''meta-analysis'' even without the author doing so; e.g., Lahti (2001) about So ¨derstro ¨m (1999), Rudel et al. (2009) about Geist and Lambin (2002) or Ahumada et al. (2011) about Vetter et al. (2011).This imprecise and loose utilization of the term is problematic.As a metaanalysis typically provides scientifically rigorous results (The Cochrane Collaboration 2012), declaring a less powerful summary analysis technique to be a meta-analysis could therefore result in a form of deceptive packaging.Precise and unambiguous usage of the term would help in avoiding misinterpretations among scientists, by the public and by decision makers.We are thus advocating a consistent and well-defined usage of the term meta-analysis in our disciplines.
In this paper, we evaluate the usage of the term ''meta-analysis'' in the research fields of ecology and conservation biology, based on rules from meta-analysis literature in the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).We rate articles according to their usage of the v www.esajournals.orgterm, discuss reasons for the inconsistent terminology and highlight the importance of considering heterogeneity and presenting effect sizes for all single studies.We conclude with recommendations on how to apply ''meta-analysis'' in the future.We are solely evaluating the usage of the term ''meta-analysis'' as a statistical summary technique and therefore do not consider issues related to systematic reviews (e.g., literature search, formulation of inclusion/exclusion criteria).Our rating score should not be misinterpreted as a quality label, but rather as a tool to help us evaluate the usage of the term meta-analysis.

METHODS
In August 2011 we searched the Web of Knowledge for article titles including ''metaanalys* OR meta analys*'', refined by the subject area ''biodiversity conservation''.We probably missed some articles on meta-analysis that do not include the term in the title, yet a topic search would have produced too many articles on other subjects (e.g., meta-population and analysis, meta-information and analysis, etc.).We excluded all articles from our evaluation that did not claim to do a meta-analysis, i.e., articles on theoretical or methodological aspects of metaanalysis or responses to previous meta-analyses (see Appendix A).We further discarded conference proceedings when only abstracts were available.
We then evaluated and rated all remaining articles according to the usage of the term ''metaanalysis''.Since a list of predefined requisite steps for a meta-analysis does not exist in the literature, we determined steps that in our opinion are mandatory in a meta-analysis, based on meta-analysis literature from the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).In the first round of rating, we assessed four steps that we call ''technical'' steps, because they represent procedures that are normally automatically applied in meta-analytical software.Only articles that were rated as ''technically'' complete meta-analysis in the first round entered into the second rating round where we assessed three more ''qualitative'' requirements regarding interpretation and presentation of results.The four requisite steps in the first rating included: (1) generating an effect size metric based on continuous data, binary data or correlations; (2) weighting effect sizes by sample size or precision; (3) pooling of effect sizes into a summary effect or reasoning against pooling (e.g., due to high variation between effect sizes); (4) calculating confidence intervals for each effect size and the summary effect.Articles could receive one point per item, i.e., a rating score between 0 and 4. We awarded half points in some cases, mostly when authors had applied a sound meta-analysis procedure, but did not report if effect sizes had been weighted, e.g., Arredondo-Nu ´ñez et al. (2009).Each article that received a rating score of at least 3.5 was assessed in the second round, where we evaluated if these articles also interpreted and discussed the results in a broader context by (1.1) quantifying total heterogeneity/variability (i.e., we did not count between group heterogeneity) in effect sizes by an index measure (Higgins 2008, Higgins andThompson 2002), (1.2) exploring existent heterogeneity/variability in effect sizes by considering explanatory variables (e.g., in subgroup analyses or meta-regressions) (Thompson andHiggins 2002, Higgins andThompson 2004), (2) presenting results in form of a forest plot.We will elaborate on the significance of heterogeneity and forest plots in meta-analysis in the Discussion section.Again, articles could receive one point per item (i.e., a rating score between 0 and 3) and half points in case of a forest plot that did not display weights, e.g., Jactel et al. (2005).We further assessed the potential influence of publication year or a journal's impact factor (taken from Journal Citation Reports 2010 on the Web of Knowledge) on rating scores applying Kendall's tau correlation coefficient in R (R Development Core Team 2009).

RESULTS
Our literature search yielded 160 articles.We excluded 13 conference proceedings that only provided abstracts, 12 articles that did not conduct a meta-analysis but considered theoretical or methodological aspects of meta-analysis or were responses to previous meta-analyses, one article that was a meta-analysis of meta-analyses and one article that did not include ''metaanalysis'' in the title (see Appendix A).
In the first rating round, of the133 articles, 60 (45%) fulfilled all technical requirements for a sound meta-analysis (score ¼ 4) while 33 (25%) did not fulfill a single requirement (score ¼ 0) (see Table 1 and Appendix B).Twenty-three (17%) articles fulfilled almost all requirements for a meta-analysis, but were down ranked (score ¼ 3.5), because authors did not report if effect sizes had been weighted.In 12 (9%) articles authors applied at least one step of meta-analysis (score ¼ 1), mostly generating effect sizes, which is the first step in a meta-analysis.However, effect sizes were then used for other quantitative calculations, not for a meta-analysis.In one article, in addition to calculating effect sizes, weighting was done by classes (Vanderwel et al. 2007; score ¼ 1.5).There were two articles (Dupont et al. 2010, Hendriks et al. 2010;score ¼ 3) where authors used the dimensionless ratio s (Gurevitch and Hedges 1993) as an effect size, which inhibits weighting.In another article, which received a score of 3, effect sizes, confidence intervals and an overall effect (Johnston and Roberts 2009) were calculated.In one article the authors did not weight effect sizes without stating a reason (Nichols et al. 2007; score ¼ 3).
In the second rating round, we assessed 83 articles that had a minimum score of 3.5, i.e., which had fulfilled the technical requirements for meta-analysis in the first rating round (see Table 1 and Appendix C).In 18 (22%) of these 83 articles neither heterogeneity was considered nor were forest plots presented.In 23 (28%) articles authors quantified heterogeneity using an index measure and in 62 (75%) they explored heterogeneity by including explanatory variables.Only in three (4%) articles results were presented in forest plots, in four (5%) more authors presented a forest plot without displaying weights and in one (1%) study a forest plot was not presented for single studies, but grouped by species.In the remaining 75 (90%) articles results were not presented in forest plots.Only one article (Benı ´tez-Lo ´pez et al. 2010) fulfilled all requirements of the second rating.
Of the articles we evaluated, the first was published in 1992 (Taylor and White 1992), but does not fulfill the requisite steps for metaanalysis.The first articles fulfilling the technical requirements of meta-analysis are from 2002 (Ainsworth et al. 2002, Blenckner and Hillebrand 2002, Guo and Gifford 2002, Millar and Methot 2002), while the first article where authors also fully consider heterogeneity is from 2004 (Moore et al. 2004) and the first article where results are presented in a forest plot is from 2005 (Kalcounis-Ru ¨ppell et al. 2005).The number of articles published using meta-analysis have increased considerably since 1992 and reached a peak at 26 publications in 2010 (Fig. 2).A journal's impact factor correlated positively with rating score 1 (Kendall's tau ¼ 0.244, N ¼ 124; Fig. 3) and also with the total rating score (Kendall's tau ¼ 0.181, N ¼ 124), but not with rating score 2 (Kendall's tau ¼À0.025,N ¼ 81; Fig. 4), whereas publication year did not correlate with either rating score (Kendall's tau ¼ 0.043, 0.031 and À0.043, respectively).

DISCUSSION
Meta-analysis has become a popular summary analysis technique in ecology and conservation biology, yet, we showed that the utilization of the term is inconsistent and often confused with other types of quantitative analyses.Our results indicate that more than one-third of the articles with ''meta-analysis'' in their title did not fulfill the technical requirements of a sound metaanalysis according to the meta-analysis literature from the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).In quite a lot of articles authors did not consider the heterogeneity in

Limitations of our rating system
Our rating score refers solely to the usage of the term meta-analysis consistent with the standard literature from the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).We do not make any statement on the overall quality of the studies and our rating score should not be misinterpreted as a quality label, but rather as a tool to aid us in evaluating the usage of the term meta-analysis.We only evaluated if a study applied the formal procedure of a metaanalysis, but analyses or results may still be ambiguous.In our evaluation, we could not address the quality of the raw data, if they were adequate for the question being asked, if the generation of effect sizes had been done correctly, etc. Readers should therefore consider each metaanalysis critically and appraise its quality and validity.If a study in our evaluation received the rating score 0, we do not at all state that it is a poorly conducted study or applying flawed statistics.We only argue that this study, claimed to be a meta-analysis, is not a meta-analysis according to the standard literature from the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).
We note that the classical approach for metaanalysis on which we based our evaluation (Borenstein et al. 2009, Higgins andGreen 2011) is a frequentist approach.By contrast, in five of the rated articles, a Bayesian approach was used (Millar and Methot 2002, Helser and Lai 2004, MacNeil and Graham 2010, Duncan et al. 2011, Mellin et al. 2011).We are aware of the many differences between the Bayesian and the frequentist approach and found it difficult to apply the same set of criteria to both kinds of metaanalyses.We decided to base our rating on the classical frequentist approach, which was far more common among the evaluated articles.As a consequence, the Bayesian approaches through-Fig.2. Increase in publications since 1992 in which meta-analysis appears in the title.
v www.esajournals.orgout received one rating point less in the second rating round (item 1.1), since these approaches commonly do not calculate the required index measures.

Inconsistent terminology
It could be argued, that the term ''metaanalysis'' has developed a different tradition of application and perception in ecology and conservation biology than in the medical sciences.Instead of the rigorous technique of medical sciences, meta-analysis in ecology and conservation biology may rather refer to a more general type of analysis at a meta-level.To object to this argument, we elaborate on the evolution of the term ''meta-analysis''.A title search for ''metaanalys* OR meta analys*'' in the Web of Knowledge indicates that prior to its first appearance in 1977 the term had never been used with any other meaning.Smith and Glass (1977) published the first paper indexing ''metaanalysis'' in the title which appeared in the journal American Psychologist.Refining to the subject category ''environmental sciences ecology'' shows that the first record with ecological background is a meeting abstract from 1990 by Gurevitch et al.The first ecological research articles conducting meta-analyses were published in 1992 (Gurevitch et al. 1992, Vanderwerf 1992), both of which technically apply metaanalysis according to the standard literature from the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).In addition, Gurevitch et al. (1992) explicitly consider and discuss heterogeneity.In spite of this exemplar metaanalysis from 1992, the term came into fashion in the research areas of ecology and conservation biology afterwards and has been used increasingly ever since, obscuring and confusing its original meaning.
Seemingly, researchers connect the term metaanalysis with the idea of some quantitative statistical calculations combining independent studies from the literature; an idea that is not wrong in itself, but too vague and not sufficient to be qualified as a meta-analysis.Repeatedly, or correlational studies were termed meta-analysis (Hartley and Hunter 1998, Benayas et al. 2009, Creel and Rotella 2010).Also, the votecounting approach that utilizes p-values instead of effect sizes, is commonly confused with metaanalysis (Chalfoun et al. 2002, Na ´jera and Simonetti 2010), although it has been pointed out several times that vote-counting is statistically problematic and may result in false conclusions (Gurevitch and Hedges 1999, Borenstein et al. 2009:251-255, Higgins and Green 2011: Chapter 9.4.11).The use of vote-counting is not recommended, except in the case of insufficient data and only if a null result is not interpreted as an absent effect (Borenstein et al. 2009:252).Almost all of the erroneously termed metaanalyses performed some form of quantitative analyses and almost no narrative literature review claimed to be a meta-analysis (Sarma et al. 2010).

Heterogeneity
Meta-analysis allows us to calculate the magnitude rather than the existence of an effect; an important difference when we want to know if e.g., an intervention improves the habitat for an endangered species by 20% or by 80% (Borenstein et al. 2009:12).Moreover, meta-analysis offers the possibility to assess if effect sizes are homogeneous across studies (Higgins 2008).If the effect sizes vary across studies, i.e., if there is heterogeneity, the interpretation of results will be substantially different than in the case of consistent effect sizes, e.g., if the intervention improves the habitat for the endangered species consistently by 50% or within a range from 10% to 90% (Borenstein et al. 2009:105).A meta-analysis offers formal methods to explore and measure heterogeneity of which authors should make use.If variation between effect sizes is very high, the presentation of a summary effect might be inadequate.Subgroup analyses and meta-regression can help to explain existent heterogeneity by comparing the effect size between different v www.esajournals.orgsubgroups and explore the relationship between variables and effect sizes, respectively (Thompson andHiggins 2002, Borenstein et al. 2009:105, 378).Ecological studies can almost never be reproduced with identical results (Ellison 2010), making heterogeneity an issue in every ecological meta-analysis.In the face of the high complexity and heterogeneity in natural systems, it is worrying that most meta-analysts do not consider or even mention the heterogeneity/variability of effect sizes in their meta-analysis.

Forest plots
Forest plots are the standard format to present the results of a meta-analysis in the medical sciences, since they are very informative and intuitive (Borenstein et al. 2009:366-369).A forest plot holds important information apart from the summary effect, namely the individual effect sizes with confidence intervals of all studies included in the analysis and the weight by which the study was counted for the overall analysis.In ecological and conservation biological papers, instead, a kind of reduced forest plot has become common that displays no more than the summary effect with confidence interval.We contend that authors are holding back important information from their readers by only presenting reduced forms of forest plots, because a seemingly clear and significant summary effect might be composed of rather heterogeneous single effect sizes.Some may argue that forest plots are not feasible, because the information that must be plotted is too extensive.We suggest that at least the primary outcome should be presented in forest plot form and that very extensive forest plots could be included in online appendices (if neither option is possible, the information from the forest plot could at least be provided in form of a table with all effect sizes, corresponding confidence intervals and weights).A forest plot enables the reader to quickly assess the number of studies that form the summary effect, the precision of the included studies and the homogeneity/heterogeneity across effect sizes (Borenstein et al. 2009:366).The forest plot, thus, also provides a graphical overview of possible heterogeneity between effect sizes (Sutton and Higgins 2008).
Although meta-analysis is a rigorous and valuable methodology, readers should critically appraise each meta-analysis and not blindly trust in the correct usage of the methodology as applied by the authors.

RECOMMENDATIONS
For consistency, the term ''meta-analysis'' should be exclusively used to refer to the specific, rigorous methodology as applied in the medical sciences (Borenstein et al. 2009, Higgins andGreen 2011).As part of a systematic review (Fig. 1), a meta-analysis should include all of the following seven steps: (1) generating an effect size metric based on continuous data, binary data or correlations; (2) weighting effect sizes by sample size or precision; (3) pooling of effect sizes into a summary effect or reasoning against pooling (e.g., due to high variation between effect sizes); (4) calculating confidence intervals for each effect size and the summary effect; (5) quantifying total heterogeneity/variability (i.e., not only between group heterogeneity) in effect sizes by an index measure; (6) if heterogeneity is existent: exploring heterogeneity/variability in effect sizes by considering explanatory variables (e.g., in subgroup analyses or meta-regressions); (7) presenting results in forest plots or providing respective data (effect sizes, corresponding confidence intervals and weights for all included studies) elsewhere (e.g., in a table ).

CONCLUSION
We call upon authors and reviewers to apply the term ''meta-analysis'' consistently and correctly and not to confuse it with other summary analysis techniques.We further point out the importance of comprehensive data reporting in primary research to allow for meta-analysis (Côte and Reynolds 2012, Gurevitch and Hedges 2001, Nakagawa and Poulin 2012).Standard reporting guidelines therefore are a rule in the medical sciences (CONSORT 2012, The EQUATOR Network 2012) as well as in psychology (American Psychological Association 2009).Equally impor-tant is the reporting of data and analytical tools from meta-analyses (Ellison 2010, Nakagawa andPoulin 2012) and, again, guidelines for the reporting of meta-analyses are stated by PRISMA (Moher et al. 2009) in the medical sciences, by MARS (American Psychological Association 2009) in psychology and by MAER-Net (Stanley et al. 2013) in economics.Authors, editors and reviewers in the fields of ecology and conservation biology may not only greatly contribute to setting the stage for meta-analyses by comprehensive reporting of primary research results, but also to enhancing the transparency of metaanalyses.We feel confident that meta-analysis will prove a vital technique for summarizing the wealth of primary research results in the fields of ecology and conservation biology over the next decade.

Fig. 1 .
Fig. 1.The steps of a systematic review (based on Borenstein et al. 2009, Higgins and Green 2011).
effect sizes or present results in forest plots.Only one single article out of the 133 assessed articles reached a full rating score in the first and second round (Benı ´tez-Lo ´pez et al. 2010).Our findings provide evidence of an ill-defined usage of the term ''meta-analysis'' in ecology and conservation biology and underline the importance of a consistent and unambiguous definition.

Fig. 3 .
Fig. 3. Rating score 1 (first rating round) shows a positive correlation with a journal's impact factor (taken from Journal Citation Reports 2010 on the Web of Knowledge; Kendall's tau ¼ 0.244, N ¼ 124).

Fig. 4 .
Fig. 4. Rating score 2 (second rating round) shows no correlation with a journal's impact factor (taken from Journal Citation Reports 2010 on the Web of Knowledge; Kendall's tau ¼ À0.025, N ¼ 81).

Table A1 .
All 160 articles found through the literature search in the Web of Knowledge (August 2011) and reasons for the exclusion of 27 articles.