The role of replication studies in ecology

Abstract Recent large‐scale projects in other disciplines have shown that results often fail to replicate when studies are repeated. The conditions contributing to this problem are also present in ecology, but there have not been any equivalent replication projects. Here, we survey ecologists' understanding of and opinions about replication studies. The majority of ecologists in our sample considered replication studies to be important (97%), not prevalent enough (91%), worth funding even given limited resources (61%), and suitable for publication in all journals (62%). However, there is a disconnect between this enthusiasm and the prevalence of direct replication studies in the literature which is much lower (0.023%: Kelly 2019) than our participants' median estimate of 10%. This may be explained by the obstacles our participants identified including the difficulty of conducting replication studies and of funding and publishing them. We conclude by offering suggestions for how replications could be better integrated into ecological research.

disciplines) and conceptual replications make deliberate variations. The dichotomy between direct and conceptual is an oversimplification of a noisy continuum, and many more fine-grained typologies exist (for a summary see Fidler & Wilcox, 2018) including ecology and evolutionary biology-specific ones (Kelly, 2006;Nakagawa & Parker, 2015). Broadly speaking, replication studies at the "direct" end of the continuum assess the "conclusion" validity of the original findings (whether the originally observed relationship between measured variables is reliable). Those original findings might be invalid because sampling error led to a misleading result, or because of questionable research practices or even fraud. Replication studies at the "conceptual" end of the continuum test generalizability and robustness, this includes what has previously been termed "quasireplication" where studies are replicated in different species or ecosystems. Where a replication study is placed on the direct-conceptual continuum and what epistemic function it fulfils depends on the scope of the claim in the original study and how the replication study conforms to or differs from that. For example, imagine I am conducting research in the Great Barrier Reef, and I collect data from some locations in the northern part of the reef. If, after analyzing my results, I make explicit inferences to the Great Barrier Reef as a whole, then studies anywhere along the reef employing the same methods and protocols as the original could reasonably be considered direct replications (within reasonable time constraints, of course). However, if I had constrained my inference to just the northern reef, it would not be reasonable to consider new studies sampling other locations direct replications. Replications beyond the Great Barrier Reef, for instance on coral reefs in the Red Sea, would be conceptual replications in both cases. In Table 1, we illustrate how varying different elements of a study while holding others constant can allow us to interrogate different aspects of its conclusion. However, as the example of the reef demonstrates, whether any given replication is considered direct and conceptual is intrinsically tied to the scope of the inference in the original research claim.
It is worth noting in advance of the next section that the largescale replication studies from other disciplines we describe there, and their associated replication success rates, refer exclusively to direct replication studies.

| Cause for concern over replication rates
Over the last 8-10 years, concern over a "replication crisis" in science has mounted. The basis of this concern comes from large-scale direct replication projects in several fields which found low rates of successful replication. Studies included in these projects all attempted fair tests of the original hypothesis, and most were conducted with advice from the original authors. This may mean that the location or time of the replication study differed from the original, but only in cases where location was not specified as being part of the scope of the claim in the original study.
Low rates of successful replication are usually attributed to poor reliability because of low statistical power in the original studies (Maxwell, Lau, & Howard, 2015); publication bias toward statistically significant results (Fanelli, 2010(Fanelli, , 2012Franco et al., 2014); and the use of questionable research practices (e.g., selectively reporting statistically significant variables, hypothesizing after results known: Agnoli, Wicherts, Veldkamp, Albiero, & Cubelli, 2017;Fraser, Parker, Nakagawa, Barnett, & Fidler, 2018;John, Loewenstein, & Prelec, 2012). TA B L E 1 Direct and conceptual replications in ecology. "S" means that the study element in the replication study is similar enough to the original study that it would be considered a fair test of the original hypothesis, and "D" means that the study element is distinctly different in original and replication studies, testing beyond the original hypothesis  (Sánchez-Tójar et al., 2018;Seguin & Forstmeier, 2012;Wang et al., 2018). In addition, all of the conditions expected to drive low rates of replication mentioned above appear common in ecology and evolution (Fidler et al., 2017;Parker et al., 2016): low power (Jennions & Moller, 2000), publication bias (Cassey, Ewen, Blackburn, & Moller, 2004;Fanelli, 2012;Franco et al., 2014;Jennions & Moller, 2002;Murtaugh, 2002), and prevalence of questionable research practices (Fraser et al., 2018).

| Scientists' attitudes toward replication
In the late 1980s, sociologists of science Mulkay and Gilbert interviewed a sample of biochemists about their replication practices.
In particular, they were interested in whether these scientists replicated others' work. Most reported that they did not. And yet, the scientists uniformly claimed that their own work had been independently replicated by others. This seems to suggest an implausible state of affairs where everyone's work is replicated but no one is doing replicating (Box 1).
Mulkay and Gilbert's explanation of this potential contradiction rested on the notion of "conceptual slippage." That is, the definition of "replication" that researchers bring to mind when asked about replicating others' work was narrow, centering around direct or exact replication. When considering whether their own work had been replicated by others, they broadened their definition of replication, allowing conceptual replication (different operationalizations and measurements, extensions, etc.). Mulkay and Gilbert referred to the former as "mere replication" and report that it was rarely valued by the scientists in their interview sample. For example, one interviewee referring to another laboratory that is known to replicate studies said: "They actually take pride in the fact they are checking papers that have been published by others, with the result that a great deal of confirmatory work precludes their truly innovative contribution to the literature" (Mulkay & Gilbert, 1991, p. 155).
Dismissal of the value of direct replication research is echoed in Madden's , Easley, and Dunn (1995)

| Rationale for the current study
Our goal here is to document and evaluate researchers' self-reported understanding of, attitudes toward, and (where applicable) objections and obstacles to engaging in replication studies.
The current work investigates Kelly's (2006) argument that there exists in ecology "a general disdain by thesis committees… and journal editors for nonoriginal research" (p232). Echoing findings by Ahadi et al. (2016), Kelly proposed that replication studies may be hard to publish when they agree with the original findings because they do not add anything novel to the literature and also when they disagree with the original findings because the evidence from the original study is given greater weight than the refuting evidence. The current project is, in the broadest sense, an empirical investigation of these issues.

| Survey participants
We distributed paper and online versions of our anonymous sur- We have no reason to expect these populations to differ from other populations of ecologists in their opinions regarding replication.
However, replication studies in other locations would be needed to assess the generalizability of our results.

| Survey instrument
Our survey included multiple-choice questions about the following: • How important replication is in ecology • Whether replication is necessary for the results to be believed or trusted • Whether there is enough replication taking place • Whether replication is a good use of resources • We also asked participants to specify the percentage of studies they believe to be replicated in ecology using a slider bar and asked free-text response questions about following: • Aside from replications, what might make participants believe or trust a result • What are the obstacles to replication

| Data analysis
The code and data required to computationally reproduce our results and qualitative responses are available from https://osf.io/ bqc74 /. For each of the multiple-choice questions, we plotted the proportion (with 95% Confidence Intervals, CIs) of researchers who selected each of the options (e.g., the proportion of researchers who indicated that replication was "Very Important," "Somewhat Important," or "Not Important" in ecology) using ggplot2 (  Around a third of our sample agreed that replication is important with caveats, suggesting that given limited funding, the focus should remain on novel research (37%, 95%CI: 32%-41%, n = 157 of 428 participants, Figure 1c) or that they should only be published in special editions or specific journals (30%, 95%CI: 25%-34%, n = 126 of 427 participants). We specifically worded these response items (i.e., pointing to funding scarcity, and publishing only in special issues) to mitigate demand characteristics, that is, undue influence to provide a positive answer to a survey question.

| Believability and trust
When asked "does an effect or phenomenon need to be successfully replicated before you believe or trust it," 43% (95%CI 38%-48%,

| Checking for replications
We asked how often participants checked for replication studies when they come across an effect or phenomenon that was plausible versus implausible. Very few participants (9%, 95%CI: 7%-12%, n = 39 of 429 participants) "almost always" checked whether a study was replicated if they thought the result were plausible. Participants were more likely to check for replication studies if they found the effect implausible but even then, only 27% (23%-31%, n = 116 of 429 participants) of participants said that they "almost always" checked ( Figure 2).

| What is a replication study?
In order to get a picture of what our sampled ecologists consider to be replication studies, we asked participants to select as many options as they wanted from Table 3. The top four options represent the spectrum of replication studies from most direct (first option) to most conceptual (fourth option). The number of participants who considered the options to be replication studies decreased with decreasing similarity between original and replication study. Options 5 and 6 in Table 3 are related to computationally reproducing the results by reanalyzing a study's data. Computational reproducibility is a related concept to replication and has similar, if more limited, epistemic purpose: If the analysis is kept the same, it can detect mistakes and inconsistencies in the original analysis ( We tested whether different understandings of the definition or scope of replication produced different estimates of the rate of replication studies. We divided participants' estimates of replication rates according to which types of study included in Table 3 each participant considered a type of replication. The estimated replication rate was similar in all subsets.

| Obstacles to replication studies
When asked to comment on the obstacles to replication, 407 participants provided free-text responses, giving insight into why the replication rate might be low (Table 4).

| Importance of replication
The overwhelming majority of the ecologists in our study were very positive about replication studies. They considered replication studies to be important, want to see more of them in the literature and support publishing them (Figure 1a

Replication means different things in different fields.
In biodiversity research replication of studies/phenomena, typically with different settings, species, regions etc., is absolutely essential. The question is when there is enough evidence, i.e. when to stop.
There is little point in replicating the study EXACTLY (cf. your question 9 above). In molecular biology or e.g. ecotoxicology it seems that doing the latter actually makes more sense. Different labs should span together and run the same experiment in parallel to eventually publish together.

TA B L E 3
Statements of different types of variations a new study might make to an original, and the percentage of total participants (n = 430) who considered each variation type a "replication study." Also shown is the mean estimate of the replication rate in ecology, calculated separately for participants who indicated that each of the option constituted a "replication study."

Mean estimate of replication rate in ecology (95% CI) a
Redoing an experiment or study as closely as possible to the original (e.g., with same methods and in the context, region, or species) 90% (87-92) 21% (19-24) Redoing an experiment or study with same (or similar) methods in a new context (region or species, etc.).

36% (32-41) 21% (17-24)
None of the above 1% (0-2) NA a Mean is used rather than median because it is more sensitive to differences between subsets of participants. However, there is a disconnect between this message of support for replication studies expressed in portions of our survey and the data on how researchers publish, use, and prioritize replications.

TA B L E 4
First, the best available estimate is that only 0.023% of ecology studies are identified by their authors as replications (Kelly, 2019). This is tiny compared to our participants' median estimate of 10% replication. The disconnect is evident even within our survey, where only a minority of respondents claimed to "almost always" check for replications when investigating a finding (Figure 2), despite emphasizing the importance of replication in other questions and free responses. Similarly, around a third of participants also indicated that, given limited funding, the focus should continue to be on novel research ( Figure 1c) and that replication studies should only be published in special editions or dedicated replication journals, or only if the results differ ( Figure 1d). This, combined with comments such as "People often want to research something novel, I think there's a mental block among scientists when it comes to replication; most recognize it's necessary, but most aren't particularly interested in doing it themselves," suggests a gap between the perceived value of replication studies and the impetus to perform them. Comments such as this expose the mistake of assuming replication work-even direct replication-cannot make a novel contribution. For example, working out which aspects of a study are intrinsic to its conclusion and should not be varied in a replication is itself a substantial intellectual contributions (Nosek & Errington, 2017).
This disconnect may be explained by the obstacles identified in this paper, chief among them (a) researchers are, perhaps rightly (Ahadi et al., 2016;Asendorpf & Conner, 2012;Baker & Penny, 2016), concerned that they would have trouble publishing or funding replication studies, (b) conducting replication studies can be logistically problematic, (c) environmental variation makes conducting and interpreting the results of replication studies difficult (Shavit & Ellison, 2017), and (d) researchers are unwilling to conduct replication studies because they believe they are boring and less likely to provide prestige than novel research (Ahadi et al., 2016;Kelly, 2006).
There is movement toward making replication studies more feasible and publishable in other fields, with the inclusion of a criterion describing journals' stance on accepting replication studies as part of the TOP guidelines (Nosek et al., 2015; to which over 5,000 journals are signatories) and the advent of Registered Replication Reports (Simons, Holcombe, & Spellman, 2014) at several psychology journals. Similarly, initiatives like the many laboratories projects (e.g., Klein et al., 2014), StudySwap (https://osf.io/9aj5g /) and the psychological science accelerator (https://psysc iacc.org/) build communities that may help overcome the logistical difficulties with replication studies as well as increasing the interest and prestige associated with conducting replication studies. Although no initiatives to directly replicate previously published studies yet exist in ecology, there is a growing movement to improve assessment of generality of hypotheses through collaborations across large numbers of laboratories, implementing identical experiments in different systems (Borer et al., 2014;Knapp et al., 2017;Peters, Loescher, Sanclements, & Havstad, 2014;Verheyen et al., 2016Verheyen et al., , 2017

| Conceptual slippage
As in Mulkay and Gilbert (1991), we find evidence of conceptual slippage between different types of replication study. We asked participants whether they consider different types of potential studies "replication studies." Participants were able to select multiple options. We expected that participants who include conceptual replications in their definition of replication studies would provide higher estimates for the percentage of ecological studies that are replicated. However, there was little difference in participants' estimates of the replication rate regardless of how permissive their definition of replication was (Table 3). This suggests that ecologists have a fluid definition of what a "replication study" is. Similarly, the majority of surveys were distributed by hand, and early in the data collection, it became evident that some were thinking about replicates within a study (i.e., samples) rather than replication of the whole study.
As soon as this became evident, we informed each new participant that we were interested in repeating whole studies, not replicates or samples within study. The effect of this confusion on our results is likely to be minimal, because certainly virtually all ecology studies contain within-study replicates but only 36 of 439 participants (8%) gave answers higher than 50% for the question "What percentage of studies do you think are replicated in ecology?". This 8% presumably captures all the participants who were answering about "replicates" as well as some that have a very broad definition of what constitutes a replication study.

| The continuum of replication
We found very high level of agreement (90%) that "redoing an experiment or study as closely as possible to the original" (i.e., a direct replication) should be considered a replication study. Most ecologists had a view of replication studies that is much broader than direct replication to the extent that 38% considered "redoing an experiment or study with different methods in the same context" and 14% considered "redoing an experiment or study with different methods in a different context" to be replication studies. This permissive definition of a replication study may be driven by the strong influence of environmental variability on the results of ecological research. It is also consistent with Kelly's (2006) observation that conceptual and quasireplication are common in behavioral ecology. Conceptual and quasireplications are required to extend spatial, temporal, and taxonomic generalizability in a field with multitudes of study systems, all of which are strongly influenced by their environment.
Many participating ecologists commented that direct replications may be difficult or impossible in ecology due to the strong influence of environmental variability and need for long-term studies, concerns that are also voiced by Kelly (2006), Nakagawa and Parker (2015), Kress (2017), and Schnitzer and Carson (2016). Schnitzer and Carson (2016) propose that putting more resources into ensuring that new studies are conducted over a large spatial and temporal scale performing a similar epistemic function as certain types of replication study. Nakagawa and Parker (2015) suggest that the impact of environmental variability can be overcome by conducting multiple robust replications (inevitably in different environmental conditions) and evaluating the overall trends using meta-analysis. In contrast, Kelly (2006) advocates pairing direct and conceptual replications within a single study, providing insights about both the validity and generalizability of the results and increasing the chance of publication (when compared to a direct replication alone). These suggestions have the potential to make replication studies in ecology more feasible and thereby improve the reliability of the ecology literature.
Emphasizing the importance of conceptual replications may also make it easier to build a research culture that is more accepting of replication studies.
Conceptual replications may already be common in ecology and evolutionary biology, but presumably because of the desire to appear novel, such studies are almost never identified as replication. Kelly (2006) found that even though direct replications were absent from a sample of studies in three animal behavior journals, more than a quarter of these studies could be classified as conceptual replications with the same study species, and most of the rest were "quasireplications" in which a previously tested hypothesis was studied in a new taxon. It seems therefore that testing previously tested hypotheses is the norm. We just do not notice because researchers explicitly distinguish their work from previously published research rather than calling attention to the ways in which their studies are replications. In fact, almost none of these conceptual or quasireplications are identified as replications by their authors (Kelly, 2019). This brings up two shortcomings of the current system. First, as pointed out earlier, researchers almost never conduct direct replications, and so the benefits of direct replication in terms of convincing tests of internal validity, are nearly absent. Second, even when researchers conduct conceptual or quasireplications, if they are reluctant to call their work replication, some of the inferential value of their work in testing for generality may be missed. In fact, anecdotally, it seems that inconsistency among conceptual replications is often attributed to biological variation and that this is typically interpreted as meaning that the hypothesized process is more complex or contingent on other factors than originally thought. The generality of the original hypothesis is often not directly challenged.

| CON CLUS ION
Most of our participating ecologists agreed that replication studies are important; however, some responses are suggestive of ambivalence toward conducting them. Convincing editors to accept Registered Replication Reports, emphasizing the value of less direct, more conceptual replication, and beginning grassroots replication initiatives (inspired by StudySwap, psychological science accelerator, the many laboratories projects, and existing distributed experiments in ecology) in ecology and related fields may combat ecologists' reluctance to conduct replication studies. Beyond that, we believe that the best approach to replication studies in ecology is to: 1. Identify subsets of studies for which direct or close replication is possible and, because of their importance, value and put resources into such replications. If possible, conduct these as Registered Reports (Nosek & Lakens, 2014). 3. Identify subsets of studies for which generalizability is the main concern, and work toward developing "constraints of generality" statements for them (Simons, Shoda, & Lindsay, 2017).
Constraints on generality statements explicitly identify the conditions in which the authors think their results are or are not valid. This frees replicators from matching conditions directly and allows replications for generality within constraints laid out by the original authors.

ACK N OWLED G M ENTS
Franca Agnoli provided feedback that improved the manuscript and 439 anonymous ecologists generously gave their time to fill in our survey.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interests.