An appeal for more rigorous use of counterfactual thinking in biological conservation

It is vital to understand the consequences of actions intended to ensure biological conservation. Counterfactual thinking is increasingly used to establish the difference between the results of conservation action and the outcome if no action had been taken. In essence, a counterfactual is the outcome had a conservation action or treatment not been applied. The impact of a treatment is the difference that it makes to intended (or unintended) outcomes, relative to a counterfactual condition (Ferraro & Hanauer, 2015; Pressey, Visconti, & Ferraro, 2015). Since the use of counterfactual thinking is increasing steadily in conservation impact evaluation, we outline here five potential challenges to the rigorous application of the approach, which mainly stem from a failure to recognize that there may be multiple counterfactual states and that their construction requires care and transparency to ensure reproducibility. Quantitative impact evaluation designs for most conservation problems can broadly be divided into two categories: strict experimental designs (e.g., Randomized Controlled Trials [RCTs]), and quasi-experimental designs (e.g., Before after control intervention [BACI] designs—see Schleicher et al., 2020). The latter may have components of experimental design, for instance where appropriate comparison groups are identified in a landscape with some random assignment or when experimental comparison groups are assigned statistically (often identified with “matching methods”), or they may be truly nonexperimental (treatment designation is nonrandom and completely outside of the control of the researcher, such as, for instance, change over time; see Margoluis, Stem, Salafsky, & Brown, 2009). Such approaches are common, particularly because many conservation actions predate the evaluation process, and thus many counterfactuals are analyzed post hoc, so after the action has been implemented. Counterfactual conditions for such conservation problems can be constructed in a variety of ways. The simplest is to state a conceptual model, in which the impact of a conservation treatment is compared with what is thought of as a control treatment (after a time period, an area not under treatment, etc.; Wilkie et al., 2006; Mascia et al., 2014). Matching methods can also be used, the purpose of which is to reduce differences in treatments and control groups to only highlight the impact of the treatments. Matching methods aim to reduce bias from confounding variables, by finding, for each treatment, one or more controls with similar observable characteristics that may alter inference (abiotic, biotic, and socioeconomic), and by statistically only comparing treatments that are similar in those characteristics (see Schleicher et al., 2020 for a recent comprehensive overview of the techniques). The last approach to finding counterfactual conditions is by formally predicting the expected outcome if the treatment had not been applied by, for example, modeling counterfactual Received: 15 September 2020 Revised: 10 February 2021 Accepted: 25 February 2021

It is vital to understand the consequences of actions intended to ensure biological conservation. Counterfactual thinking is increasingly used to establish the difference between the results of conservation action and the outcome if no action had been taken. In essence, a counterfactual is the outcome had a conservation action or treatment not been applied. The impact of a treatment is the difference that it makes to intended (or unintended) outcomes, relative to a counterfactual condition (Ferraro & Hanauer, 2015;Pressey, Visconti, & Ferraro, 2015). Since the use of counterfactual thinking is increasing steadily in conservation impact evaluation, we outline here five potential challenges to the rigorous application of the approach, which mainly stem from a failure to recognize that there may be multiple counterfactual states and that their construction requires care and transparency to ensure reproducibility.
Quantitative impact evaluation designs for most conservation problems can broadly be divided into two categories: strict experimental designs (e.g., Randomized Controlled Trials [RCTs]), and quasi-experimental designs (e.g., Before after control intervention [BACI] designs-see Schleicher et al., 2020). The latter may have components of experimental design, for instance where appropriate comparison groups are identified in a landscape with some random assignment or when experimental comparison groups are assigned statistically (often identified with "matching methods"), or they may be truly nonexperimental (treatment designation is nonrandom and completely outside of the control of the researcher, such as, for instance, change over time; see Margoluis, Stem, Salafsky, & Brown, 2009). Such approaches are common, particularly because many conservation actions predate the evaluation process, and thus many counterfactuals are analyzed post hoc, so after the action has been implemented.
Counterfactual conditions for such conservation problems can be constructed in a variety of ways. The simplest is to state a conceptual model, in which the impact of a conservation treatment is compared with what is thought of as a control treatment (after a time period, an area not under treatment, etc.; Wilkie et al., 2006;Mascia et al., 2014). Matching methods can also be used, the purpose of which is to reduce differences in treatments and control groups to only highlight the impact of the treatments. Matching methods aim to reduce bias from confounding variables, by finding, for each treatment, one or more controls with similar observable characteristics that may alter inference (abiotic, biotic, and socioeconomic), and by statistically only comparing treatments that are similar in those characteristics (see Schleicher et al., 2020 for a recent comprehensive overview of the techniques). The last approach to finding counterfactual conditions is by formally predicting the expected outcome if the treatment had not been applied by, for example, modeling counterfactual estimates (e.g., for species abundance) and comparing these with actual data (Hoffmann et al., 2015).

| CHALLENGES TO APPLYING COUNTERFACTUAL THINKING
There are many key challenges to the application of counterfactual thinking.
First, it is not straightforward to apply counterfactual thinking consistently. Statements of what constitutes counterfactual conditions reflect the evaluator's perceptions of causality in the study system and can be even more problematic if not explicitly stated (Jones et al., 2017;Pouzols, Burgman, & Moilanen, 2012;Sonter, Tomsett, Wu, & Maron, 2017). Given the widespread impacts of the anthropocene, which means that much natural habitat has been destroyed and discontinuous policy landscapes exist, finding adequate controls may be impossible in some evaluation settings. In many scenarios, there is a range of potential counterfactual conditions, rather than anyone or only "the counterfactual," and so their correct identification is very strongly related to the aims of the intended outcome (Bull, Strange, Smith, & Gordon, 2020;Peterson, Maron, Moillanen, Bekessy, & Gordon, 2018). The use of such different counterfactuals by different individuals will give rise to perceived differences in the impacts of interventions, and lead to discrepancies about the effectiveness of interventions (Bull et al., 2020). Indeed, counterfactual thinking could intentionally be misused, if perversely applied to rather express an outcome of interest. Biodiversity offset schemes can intentionally or unintentionally set targets that reach some biodiversity baseline formulated by specific counterfactual statements but may not necessarily reach intended conservation progress (see Maron, Gordon, Mackey, Possingham, & Watson, 2015;Simmonds et al., 2019;Sonter et al., 2017). For example, the offset ratios may be inadequate if they fail to account for the risk of offset failure (for a quantitative example see Lindenmayer et al., 2017). This inconsistency when applying counterfactual thinking is particularly problematic when using matching methods, as the evaluator must decide which variables to include and which not. Modeling counterfactual conditions are also inherently uncertain because this involves predicting future trends (Ferraro & Pattanayak, 2006;Pouzols et al., 2012;Sonter et al., 2017). Indeed, the determination of counterfactual conditions in science has regularly been highly contentious, as exemplified in the controversy that raged in ecology in the 1970s over whether the occurrence of interspecific competition could be determined from analyses of the distributions of species, and in so doing what could be regarded as assumptions and what as outcomes (recently revisited by Connor, Collins, & Simberloff, 2013).
Second, great experiments are difficult in the "wicked" world in which conservation operates. Whereas it has, for example, served medicine well, few conservation treatments lend themselves to the stalwart that is the Randomized Controlled Trial (RCT) for obtaining evidence, because of the complexity and inter-relatedness of the many drivers that may influence outcomes (Margoluis et al., 2009). This does not mean that evaluators should not attempt the use of RCTs, but the field requires expanded testing, refinement, and best practice guidelines to ensure more robust study designs and inferences. Encouragingly, new work is squarely addressing these challenges, for instance demonstrating that randomization techniques can improve causal inferences from landscape-level RCTs (Wiik et al., 2019). Recommendations to improve the use of RCTs include (following Pynegar, Jones, Gibbons, & Asquith, 2018; Wiik et al., 2019): (a) reducing spillover effects (where treatments affect outcomes in non-treated units), by careful selection of randomization units; (b) careful monitoring of study sites to assess temporal changes; (c) recognition that RCTs may not be appropriate in many cases, especially if interventions are not well developed and comprehensive; (d) stating a priori which confounding variables may influence both treatment and non-treatment sites; and (e) recognizing that true double-blinded designs are near-impossible in most interventions.
Third, many conservation actions pre-date the use of explicitly stated counterfactual conditions. Ideally, evaluations of actions need to be built into the original study design and data collected under a range of scenarios, both with and without the action (Ferraro & Pattanayak, 2006), and before the onset of interventions (Mascia et al., 2014). Unfortunately, despite a plethora of calls (Ferraro & Pattanayak, 2006;Margoluis et al., 2009;Mascia et al., 2014), this is still rarely done. In consequence, some counterfactual thinking can only be applied post hoc, after the conservation actions have been applied, but this exacerbates the abovementioned constraints. For example, much of the world's current terrestrial protected area estate was established long ago. Those seeking to demonstrate the conservation effectiveness of protected areas in contrast to other land uses have commonly based those comparisons on datasets measuring differences in response variables within and around protected areas (Gray et al., 2016). Estimating counterfactual conditions here is complex, since true control areas, those which are not protected, untransformed, not under alternative land use, and similar in all other characteristics, are uncommon. This may mean that controlling for observable bias in such systems is harder. Most estimates from traditional methods of assessing protected area efficacy in terms of avoided deforestation were generally higher than those from counterfactual methods (Ribas, Pressey, Loyola, & Bini, 2020). In general, a distinction must be drawn between counterfactual conditions that have been applied before the onset of interventions, and those that have been applied post hoc, as the former are assumed to be more informative (Mascia et al., 2014).
Fourth, using counterfactual thinking is at risk of an incomplete understanding of the study system, which may hamper the creation of counterfactual conditions. This may stem from the "knowing-doing gap" between researchers and practitioners. In practice, in many cases practitioners decide what to do based on the already available evidence and the urgency driving the need for actions, and apply this in adaptive management frameworks (whether formally recognized as such or not), rather than embarking on a new evaluation exercise. Experimental studies, in particular, may take years to complete, and so there may be an aversion to implementing them.
Finally, a culture of impact evaluation still does not yet broadly exist across the conservation field. The problem is far from trivial; few conservation practitioners have familiarity with the mainstream impact evaluation approaches (Mascia et al., 2014), and yet their field is one of the most challenging settings in which to apply them (Ferraro, 2009). Granting mechanisms do not always request or allow ring-fenced funding for project monitoring and evaluation frameworks or, if they do, do not outline what best practice approaches should be used to achieve adequate monitoring of interventions. Similarly, it is unclear to what extent policy and civil society at large is demanding evidence for actions, given that some ignore evidence in other arenas, such as climate change. And so, the impetus for a "culture of evaluation" must be driven from within the conservation field itself, while ensuring transparency in the limitations and advantages of evaluation approaches.
More rigorous use of counterfactual thinking in biological conservation can benefit from the following. First, evaluators must carefully and thoroughly state and explain the rationale behind counterfactuals to ensure full transparency and that those assumptions may be queried by others. Second, the outcomes of a range of different counterfactuals in an evaluation must be explored and stated, so that readers can themselves decide what the implications are of adopting different approaches. Third, better links and dialogue between practitioners and researchers will ensure that the needs of both parties are met, and that new developments in the science of counterfactual use provides actionable methods to practitioners for real-world problems. Fourth, an expansion of research is required into the inherent biases in constructing counterfactuals and the impacts thereof for interpreting conservation evaluations. Ultimately, a comprehensive set of best practice guidelines must be developed and adopted by the conservation fraternity, not simply in the literature, but with the collaboration of practitioners in consultative processes.
In conclusion, it is critical to understand where the benefits and challenges of applying counterfactual thinking lie in conservation evaluation. This issue remains unresolved in the field. Research into the use of counterfactual thinking in conservation program evaluation is ongoing and will need to be expanded if counterfactual thinking is to be applied rigorously in impact evaluation. The assumptions underpinning impact evaluations need to be explicitly reported to ensure transparency of the context in which the results of evaluations were applied, ensuring a more robust understanding of the true effectiveness of conservation actions.