The ecological outcomes of biodiversity offsets under “no net loss” policies: A global review

No net loss (NNL) biodiversity policies mandating the application of a mitigation hierarchy (avoid, minimize, remediate, offset) to the ecological impacts of built infrastructure are proliferating globally. However, little is known about their effectiveness at achieving NNL outcomes. We reviewed the English‐language peer‐reviewed literature (capturing 15,715 articles), and identified 32 reports that observed ecological outcomes from NNL policies, including >300,000 ha of biodiversity offsets. Approximately one‐third of NNL policies and individual biodiversity offsets reported achieving NNL, primarily in wetlands, although most studies used widely criticized area‐based outcome measures. The most commonly cited reason for success was applying high offset multipliers (large offset area relative to the impacted area). We identified large gaps between the global implementation of offsets and the evidence for their effectiveness: despite two‐thirds of the world's biodiversity offsets being applied in forested ecosystems, we found none of four studies demonstrated successful NNL outcomes for forested habitats or species. We also found no evidence for NNL achievement using avoided loss offsets (impacts offset by protecting existing habitat elsewhere). Additionally, we summarized regional variability in compliance rates with NNL policies. As global infrastructural expansion accelerates, we must urgently improve the evidence‐base around efforts to mitigate development impacts on biodiversity.


INTRODUCTION
We are living in an age of both severe biodiversity declines and unprecedented global expansion of built infrastructure (IPBES 2019; Laurance et al., 2015). Approximately a quar-transport networks, urban footprint, and energy production facilities already under way (Steffen, Broadgate, Deutsch, Gaffney, & Ludwig, 2015). Mitigating these impacts is therefore an urgent global priority. Currently, one of the most widely used tools for addressing the environmental impacts of infrastructure are No Net Loss (NNL) policies (Bennett et al., 2017), which mandate that a mitigation hierarchy (MH) is applied to sequentially avoid, minimize, remediate, and offset the biodiversity impacts of new developments (Bennett et al., 2017), with some variation among policies (e.g., U.S. mitigation sequence: avoid; minimize; compensate).
NNL policies are proliferating around the world (Bennett et al., 2017), reflected in the widespread implementation of biodiversity offsets . Throughout, we use the term "biodiversity offsets" to refer to all offsets implemented as the final stage of NNL policies, as nearly all policies focus on achieving outcomes that are related to or underpinned by biodiversity. However, the exact ecological characteristics for which these policies aim to achieve NNL vary considerably (e.g., U.S. wetland compensatory mitigation protects "wetland acreage and function" (EPA, 2008)). There is a notable lack of evidence regarding the actual outcomes of NNL policies because of the relative immaturity of many policies, a lack of data transparency surrounding NNL implementation , and challenges evaluating largely unobservable outcomes of the MH process (e.g., identifying avoided impacts; Sinclair, 2018). Much of the evidence of NNL effectiveness comes from individual offset case studies or simulation studies (e.g., Sonter, Tomsett, Wu, & Maron, 2017;Thorn, Hobbs, & Valentine, 2018). In the absence of a coherent body of evidence regarding actual outcomes, many theoretical criticisms and defenses of NNL have been discussed in the literature. Criticisms revolve around the ecological feasibility of restoration (Maron et al., 2012), choice, and definition of biodiversity "units" (Bull, Suttle, Gordon, Singh, & Milner-Gulland, 2013), perverse incentives to game offset policies through manipulation of counterfactuals (Gordon, Bull, Wilcox, & Maron, 2015), ethics of biodiversity trading (Ives & Bekessy, 2015), and the weakening of institutions that safeguard the environment (Walker, Brower, Stephens, & Lee, 2009). In response, defenses of NNL acknowledge that well-targeted infrastructural expansion can deliver considerable well-being benefits, and when applied according to best practice (Bennett et al., 2017;Bull et al., 2013), NNL can facilitate this without damaging biodiversity overall. Furthermore, NNL buffers impacts on biodiversity that would most likely occur anyway in the absence of NNL policy (von Hase & ten Kate, 2017). Additionally, the organization and financing of offsets may make avoiding impacts initially more favorable to developers (Calvet, Napoléone, & Salles, 2015). However, without an empirically grounded evidence base, it is unclear which arguments dominate in practice.
Evidence from case studies shows that NNL policies result in both successes and failures (Quigley & Harper, 2006a). As with any conservation intervention, developing evidence about the contextual factors that predict NNL success is essential. Additionally, researchers have reviewed and tested the major indicators of biodiversity proposed for use in NNL and evaluated whether they provide useful approximations of biodiversity changes (Bezombes, Gaucherand, Spiegelberger, Gouraud, & Kerbiriou, 2018). However, little work has synthesized which indicators are used in the practical implementation of NNL globally.
Several high-profile NNL policies have now been implemented for sufficient timescales for a preliminary understanding of outcomes to emerge (e.g., Gibbons, Macintosh, Constable, & Hayashi, 2018). However, there remains no synthesis of all the information available on the actual observed outcomes of NNL policies from around the world (i.e., whether they have demonstrably achieved NNL of their ecological characteristic of interest). Addressing this, we reviewed the global literature on the outcomes of NNL policies to synthesize literature gaps and coverage, summarize the state of the knowledge on the determinants NNL outcomes, assess the biodiversity metrics used in practice, assess regional compliance with NNL policies, and evaluate the validity of the existing literature. For clarity, our study addresses both the effectiveness of NNL policies (i.e., the application of the MH to development impacts under jurisdiction of a NNL policy) and individual biodiversity offsets (i.e., whether or not offsets achieve NNL in chosen biodiversity indicators at project scales).

Review protocol
We conducted a rapid evidence assessment (Khangura, Konnyu, Cushman, Grimshaw, & Moher, 2012) of peer-reviewed literature on NNL outcomes. Our search term (Supporting Information) comprised a set of strings linked by Boolean operators describing the following: • alternative offset types (e.g., "environmental"), • "offset" and commonly used alternatives (e.g., "compensat*"), • impact evaluation (e.g., "outcome*"), • and excluding nuisance terms (refined by identifying unrelated papers in the first 200 hits of our Web of Science review; e.g., "gas mitigation") Performing the same search in Web of Science and Scopus databases (final search date March 13, 2019), we removed repeats and then reviewed the remaining studies using the "metagear" package in R (Lajeunesse, 2016;R Core Team 2018). We conducted a first assessment of potentially relevant literature by selecting all studies mentioning NNL policies or offsets in their abstracts, then read the full papers to identify whether our inclusion criteria were met. We limited our review to studies published from 2003 to 2019, to account for the major reforms to the effectiveness of U.S. wetland mitigation policy introduced by the National Wetlands Mitigation Action Plan in December 2002 (Hough & Robertson, 2009). We restricted our search to English-language articles from relevant topic categories (Supporting Information). Previous research has shown that English captures most literature on offsets tied to international funding requirements, studies from North America and Oceania, and a substantial proportion of European literature, so our findings should be representative of the global literature . Additionally, we searched through all reference lists in papers meeting our inclusion criteria for additional literature.

Data extraction
Papers were included in our database if they reported observed (i.e., not simulated) ex-post ecological or land cover-related outcomes of polices with an explicit NNL-orbetter objective for aspects of biodiversity. We limited our search to peer-reviewed publications only (including conference proceedings and book chapters) to attempt to overcome the data quality issues highlighted by other reviews of offset studies that include the gray literature (Theis et al., 2019), but recognize that the majority of NNL implementation occurs outside academic evaluation. Papers reporting evaluations of individual offset projects were included if they specified the impacts (as a minimum defining the impacted habitat and area) associated with the offsets, thus allowing for a rudimentary assessment of biodiversity losses and gains. These papers compared biodiversity at offsets with either biodiversity at the impacted site (pre-initiation of impacts), or with a biodiversity reference site (Table S2 for studies considered but ultimately rejected). Notably, while we included only these studies that allowed for a site-specific estimate of biodiversity losses and gains and thus a basic evaluation of whether NNL was achieved, some key NNL policies do not assess biodiversity losses and gains in this way (e.g., U.S. wetland mitigation policy mandates that compensation sites achieve benchmark ecological criteria rather than explicitly achieving the same level of ecosystem functioning as impacted wetlands). Therefore, such NNL policies may in theory achieve full compliance but not NNL.
For each study/individual offset project where possible we extracted information regarding the following: • type of biodiversity outcome variable used to assess losses and gains, • magnitude of the outcome variable at the offset and impact/control site, • affected type of biodiversity (e.g., forest, species), • location, • mean offset age (mean time between offset initiation and outcome evaluation), • spatial scale (Table 1), • whether NNL was achieved for the outcome variable of interest, • and article author's explanations for why/why not (including only reasons that addressed the specific outcome variable used).
For each reported outcome variable, we assigned it the appropriate level for four descriptive categories (Table 1). If a paper reported multiple ecological indices or outcome variables, we recorded them all. For individual offsets that presented time-series outcomes, we recorded the outcome variable at the latest time-period to allow the maximum time for ecological recovery in the offset-control comparison. For NNL policies presenting time-series outcomes, we took the sum of the outcome variables across time periods to capture the policy's impact across the entire evaluation period (Table S3). We recorded information about the policy outcomes across its entire geographical jurisdiction (i.e., if a paper reported localized habitat losses but NNL overall (e.g., across an entire state), we recorded that NNL was achieved). We extracted data from figures using WebPlotDigitizer (Rohatgi, 2015). We recorded the raw values for outcome variables and used them to infer NNL outcomes, except for papers that compared outcomes between offset and impact/reference sites using statistical tests, where we used the test's outcome to inform NNL designation. When studies reported that outcomes for some of the projects they evaluated was unknown, we recalculated the percentage of projects reporting successes and failures restricting the total sample to only projects for which the outcome was known (Table S3). For offset project studies that reported per-unit-area values for a given outcome variable, we multiplied the outcome variable for the offset site by the offset ratio so that the final comparison between biodiversity at the impact and offset sites accounted for differences in area between the two (Table S3). Therefore, for project-scale evaluations, we did not include area as an outcome variable, but for program and landscape-scale evaluations, habitat area was included as an outcome. Additionally, we noted two important aspects of offset design: whether the described offsets referenced the additionality of their associated conservation actions (i.e., whether the biodiversity gains at the offset were additional to what would have been present T A B L E 1 Categorization of information from each study evaluating outcomes from biodiversity offsets or NNL policies

Category Groupings Inclusion criteria
Scale Landscape Assess changes in the total area of a particular land cover type regulated under a regional NNL policy (although note that some individual impacts within the geographical jurisdiction of the policy will not be regulated by the policy because of legal exemptions or illegal impacts).

Program
Assess the outcomes of a defined portfolio of offsets without necessarily comparing them with their associated impacts.

Project
Report the results of individual impact/reference and offset pairs.
Offset type Creation Result in the creation of new habitat where none existed previously.

Restoration
Restoration or enhancement of degraded habitats; may or may not result in additional habitat area.

Protection
Protection of existing habitats, may or may not involve conservation management. No additional area for conservation.
Data type Ecological site-based Primary data collected on site.
Expert judgment Judgment about outcomes elicited from experts.
Official documentation Data retrieved from official documentations such as mitigation permit files or offset registries.
Remote sensing Use remote sensing to assess changes in habitat extent.
Outcome variable type (Table S3) Community indices General indices used to describe ecological communities (e.g., species richness; Simpson index). Do not account for species identity.

Community densities
Indices showing the abundance of an aspect of biodiversity per unit area (e.g., g/m 2 fish biomass).
Habitat area Area of habitat.
Habitat quality Quality of habitat (e.g., percentage coverage of vegetation types associated with the offset habitat type).

Indices of biotic integrity
Indices of biotic integrity (Karr 1981), partially account for changes in species identity.
Regulatory compliance Degree to which a given compliance criterion has been met (compliance does not necessarily demonstrate the achievement of NNL).
Species population proxy Direct monitoring or species proxy monitoring methods targeting a particular individual or set of species (e.g., population abundance; environmental indicators of species activity levels).
in the absence of the offset), and whether losses/gains were evaluated against a static or dynamic counterfactual (Bull, Gordon, Law, Suttle, & Milner-Gulland, 2014;McKenney & Kiesecker, 2010). We also assessed the internal validity of site-based assessments of individual biodiversity offsets, paying particular attention to potential selection bias and performance bias (Bilotta, Milner, & Boyd, 2014). We recorded information about the: • study design (e.g., before-after-control-impact); • control used (e.g., either impact-site or reference-site); • sampling methods and whether those descriptions were sufficiently randomized or open to selection bias; • and number of time periods sampled and whether this was sufficient to capture intertemporal ecological dynamics.

Overview of studies
Our searches returned 15,715 articles once duplicates were removed. After screening abstracts for relevance, we fully assessed 418 articles for inclusion (Table S1). Twenty-nine studies met our inclusion criteria (7% of potentially relevant studies), with a further three identified via in-article citations, leaving 32 studies from five countries (Table 1; Figure 1). Our database includes four landscape-scale, 18 programscale, and 10 project-scale studies (covering 26 projects), and accounts for a minimum of 300,000 ha of offsets and 180,000 ha of impacts, representing approximately 2% of the global area of spatially explicit known offset implementation . In total, we identified 121 outcome variables (column 11,

F I G U R E 1
Map of all of study and project areas included in our review. Pie-charts indicate the number of projects/studies by region reporting achieving NNL, failing to achieve NNL, achieving a mixture of outcomes for different outcome variables, and for which no NNL designation could be made because the outcome variable was a measure of regulatory compliance F I G U R E 2 (a) Total number of studies/projects within our database achieving NNL. The number of studies/projects is disaggregated by spatial scale (b), offset type (c), and biodiversity type affected (d). NA represents either studies that presented outcome variables from which an NNL designation could not be determined (a), or studies where information on offset type was not provided (c). Studies evaluating the outcomes of bat mitigation actions aiming to achieve NNL in bat population status are categorized as "urban" (d) ecological outcomes they reported related to whether regulatory compliance standards were met (e.g., percentage invasive species plant cover), which often do not explicitly aim to achieve NNL of biodiversity per se at project scales (Sudol & Ambrose, 2002). When treating each offset or NNL policy independently (N = 48), NNL was achieved for 17 assessments, not achieved for 15 assessments, and both successful and unsuccessful depending on the choice of outcome variable for eight assessments. No studies demonstrated the achievement of NNL in forested ecosystems or for avoided loss offsets (Figure 2) (dataset included in Supporting Information).

Outcomes of program-and landscape-scale evaluations
Four studies conducted landscape-scale evaluations of the area of land cover changes under the jurisdiction of NNL policies, with three finding that NNL was not achieved by area (Figure 2). No causal interpretation should be given to these results as other conservation policies may have been implemented simultaneously with NNL policies. Levrel, Scemama, and Vaissière (2017) and Carle (2011) focused on Florida and 20 counties across North Carolina, respectively. Both found that total wetland area decreased over their study periods (2001-2011 and 1994-2001), despite considerable restoration efforts attributable to wetland mitigation policy. Drielsma et al. (2016) evaluated the Southern Mallee Guidelines scheme in western New South Wales, Australia. The authors modeled biodiversity change attributable to the scheme, concluding that it broadly achieved the aim of maintaining or improving native vegetation. However, discounting modeled outcomes, the observed outcomes of the scheme were that over 40,000 ha of vegetated grazing lease were cleared and "offset" through the protection of other areas, leading to an overall net loss in vegetated habitat area. Lastly, Fickas, Cohen, and Yang (2016) found that NNL in wetland area in Willamette Valley (OR) was achieved since the formal adoption of the national No Net Loss policy goal and major clarifications to Section 404 of the Clean Water Act in 1990.
Of the 12 program-scale evaluations in the literature that included outcome variables from which NNL assessments could be made, seven reported achieving NNL ( Figure 2). All seven used change in habitat area as outcome variables, and reported results from offset programs focused predominantly on habitat creation and restoration (BenDor, Brozovic, & Pallathucheril, 2007Breaux et al., 2005;Harper & Quigley, 2005;Kettlewell et al., 2008;Kozich & Halvorsen, 2012;Robertson & Hayden, 2015). The other three studies also using area as their outcome variables that failed to achieve NNL were all reporting results from offset systems based predominantly on avoided loss offsets Goldberg & Reiss, 2016;Morgan & Roberts, 2003). The remaining studies evaluated the success of bat mitigation in the United Kingdom under the objective of "NNL in local bat population status," and the percentage of offset sites in Isère, France, where the required offset habitat type or species was present. Here, NNL was not achieved for both bat presence and abundances post-mitigation (categorized as "urban" in Figure 2; Stone, Jones, & Harris, 2013), and offset habitat/species presence varied from 61% to 73% (Bezombes, Kerbiriou, & Spiegelberger, 2019).

Outcomes of biodiversity offsets
Twenty-six biodiversity offsets from 10 studies were included in our database, of which we could make NNL designations for 24. Of these, nine achieved NNL for all given outcome variables, seven failed to achieve any, and eight achieved NNL for some outcome variables but not for others ( Figure 3). There was not enough identifying variation in the data to statistically explore whether specific aspects of offset design, type, or ecology predicted the achievement of a higher percentage of total outcome variables. Nevertheless, it is noteworthy that 64% (7/11, Figure 3c) of projects with offset ratios >1 achieved NNL for all of their associated outcome variables compared with 17% for offsets with ratios ≤1 (2/12, Figure 3b).
There was nominally variation between outcome measures when comparing outcome values between offset and impact/reference sites (Figure 4), although an insufficient data volume to explore statistical differences. On average, assessments of habitat quality tended to find that the quality of offset sites was lower than that at impact sites.
For the eight project-based studies where offsets were ecologically compared with either their impact sites or reference sites, three met all our criteria for study validity (Garland, Wells, & Markham, 2017;Teels, Mazanti, & Rewa, 2004;Thorn et al., 2018). Two sampled control/offset sites at a single time-point and thus were unable to account for natural ecological variability in outcomes (Hegberg, Baker, & Pieper, 2010;Quigley & Harper, 2006a, but see justification in Quigley & Harper, 2006a), one did not report its sampling protocol and is thus open to sampling bias (Hegberg et al., 2010), and four used controls for their NNL assessments, which were collected ≥5 years before data at the offset site (Hegberg et al., 2010;Lindenmayer et al., 2017;Murata & Feest, 2015;Pickett et al., 2013).

Outcomes of studies evaluating compliance
Ten studies evaluated the degree to which NNL implementation was meeting regulatory compliance standards at programme scales ( Figure 5). Compliance across NNL programmes was imperfect, with no compliance rates exceeding 75% (Hill, Kulz, Munoz, & Dorney, 2013).

Reasons for NNL achievement or failure
The two most commonly cited reasons for a lack of NNL success were: failure of the specific conservation interventions applied by the offset (e.g., the offset species failing to respond as expected to the offsetting intervention); and offset implementation failures (Table 3). The most commonly cited reason for success was having high offset ratios. Additionally, Fickas et al. (2016) noted that NNL policy internalized impacts on wetlands that were previously not subject to regulation, thus potentially disincentivizing habitat conversion.

DISCUSSION
Our review reveals important insights about the state of the evidence base for NNL and biodiversity offsetting. We provide preliminary indications that: NNL has historically been more successful in wetland than forested ecosystems; avoided loss offsets are particularly risky; evaluations have so far predominantly used area-based outcome measures; there are potential problems with the validity of studies evaluating F I G U R E 3 (a) Frequency distribution of the percentage of outcome variables achieved for each offset project in our sample where an NNL designation could be made (including one avoided loss offset that is excluded from (b) and (c) (Thorn et al., 2018)). (b) For all creation/restoration offset projects with a multiplier ≤1. (c) For all creation/restoration projects with a multiplier >1 offset outcomes; and the most common reason for offset success appears to be the implementation of high offset ratios.
We identify a substantial gap between the global implementation of NNL and the evidence base concerning ecological effectiveness. Sixty-seven percent of the world's offsets are applied in forested ecosystems ), yet our review reveals that only four studies have assessed NNL outcomes from offsets applied to forest ecosystems or wildlife. Of these, none demonstrated that their associated NNL targets were achieved. Similarly, 20% of the world's offsets entail some form of protection or avoided loss ). Yet, only six studies have assessed NNL outcomes from this common offset type, and none found that NNL was achieved.

Exploring unsuccessful outcomes of avoided loss and forest offsets
Avoided loss offsets appeared to be unsuccessful for multiple reasons. Critically, they necessarily lead to an immediate net loss in habitat area . This can be justified as a mechanism for preventing biodiversity loss if the background rate of biodiversity loss is sufficiently high. How-ever, in the studies included here and the wider literature, it is evident that assumed rates of background declines are commonly higher than the actual rate, superficially justifying the use of avoided loss offsets when in reality gains only accrue many decades into the future Reside et al., 2019). This issue is compounded if the "protection" afforded by offsets does not actually reduce the probability of loss, most commonly when sites that are not under threat of development receive "protection" (e.g., Thorn et al., 2018). Drielsma et al. (2016) justify the use of avoided loss on the grounds that biodiversity improvements on newly protected sites could offset the losses attributable to the reduction in overall habitat extent. Whether these condition gains are achieved in reality is questionable, especially considering the consistent ecological or implementation failure of conservation management interventions associated with offsets in our sample (Bezombes et al., 2019;Lindenmayer et al., 2017).
Many of the same reasons apply to explain the apparent failure of offsets focused on forest biodiversity, although identifying explanations unique to forests is challenging as four of five forest offset studies are also avoided loss offsets. Additionally, all forest studies came from Australia, where native vegetation offsets based predominantly on avoided loss have been F I G U R E 4 Box and whisker plots showing the upper and lower quartiles and exclusive medians of the percentage difference between outcome values at offset sites relative to impact/control sites, with outcome variables grouped into categories. Whiskers indicate the maximum/minimum values that fall within ±1.5 × inter-quartile range. Values > 0 indicate that the value at the offset site exceeded that at the impact site. Four outliers (represented by dots) not shown: for the "community densities" column, outliers occurred at 1469, 3093, 3426, and 4348. Outliers are likely explained by Quigley and Harper (2006a) containing several projects with unusually high offset ratios at several of the sites, and the use of stochastic community density-based outcome measures (e.g., number of invertebrates sampled/m 2 ). Crosses denote the sample mean. See Table S3 for summary of which outcome variables were assigned to each category criticized for facilitating high rates of deforestation and species declines (Reside et al., 2019). Nevertheless, both studies evaluating interventions aiming to offset impacts on forest species found that the interventions failed to deliver ecological equivalence, providing either lower quality or less-utilized habitat than that impacted by development (Lindenmayer et al., 2017;Thorn et al., 2018). On the planning side, May, Hobbs, and Valentine (2017) identify a number of shortcomings hindering Western Australia's native vegetation offset policies from achieving NNL, including a lack of contingency planning in the case of offset failure, insufficient reporting of offset outcomes, offset performance criteria being disconnected from actual ecological outcomes, and poor compliance. May et al.'s (2017) findings are indicative of the rest of the evaluations of compliance in our dataset, with variously defined compliance rates ranging from 4% to 75%. Imperfect compliance rates per se do not guarantee failure of NNL policies from an ecological perspective, as the effects of compliance failure might be outweighed by offset multipliers (Bull, Lloyd, & Strange, 2017). However, a recent global review including gray literature demonstrated that compliance with offset permit criteria often considerably exceeds the ecological functional performance of those offsets, indicating that achieving compliance is often insufficient to achieve NNL (Theis et al., 2019). Additionally, low compliance rates do indicate that regulatory enforcement of offset outcomes is often lacking, potentially demonstrating limited institutional interest in the true outcomes of offsetting, thus weakening the probability of NNL outcomes (Walker et al., 2009). There are rarely legal mechanisms for imposing financial penalties for non-compliance (Hahn & Richards, 2013). Improving monitoring alone will not guarantee improved outcomes (Kozich & Halvorsen, 2012): compliance likely requires strict enforcement, with regulators empowered to impose F I G U R E 5 Percentage compliance and compliance criteria reported for regions in our dataset, with bar chart colors corresponding to the region providing the compliance values. Note that the type of reported compliance standards varies between studies, so rates are not comparable. PS denotes "performance standards". (a) Quigley and Harper (2006b) (2017) punishments when permits are violated (Gray & Shimshack, 2011). Such pecuniary enforcement measures have been demonstrated in the context of other environmental policies to have direct and indirect benefits, such as both increasing compliance rates within punished firms, and inducing spillovers improving compliance within unpunished firms (Gray & Shimshack, 2011). While improving compliance is likely key, if NNL policies fail to use an appropriate reference system (either the pre-impact site or control site) to define the compliance criteria for offsets, then even achieving full compliance may well fail to achieve NNL of biodiversity across the paired impacted and offset sites (Theis et al., 2019).

Achieving NNL: True success or methodological artifact?
Despite little evidence for the effectiveness of some common offset types, one-third of all projects or studies in our database reported achieving NNL. All but one of the successful NNL outcomes occurred for wetland habitats or species, with 50% of wetland projects/studies where an NNL designation could be made achieving NNL. Additionally, all of the successful NNL outcomes occurred for creation or restoration offsets. We speculate that wetland restoration offsets might have higher NNL rates than other offset types in our dataset for two main reasons: first, wetlands display T A B L E 3 List of reasons cited for NNL policy/offset success or failure. The number of citations per reason should not be taken to indicate the importance of that reason, as there was variation between papers in the depth of their discussions of potential explanations

NNL outcome Reason Scale References
NNL/offset failure, failure to achieve compliance Avoided loss leading to an overall loss in area of natural habitats Program Morgan and Roberts, 2003;Gibbons et al., 2018 Compliance standards unrelated to ecological outcomes Program May et al., 2017 Conflict with development Program Shafer and Roberts, 2008 Conservation intervention failure Program; project Quigley and Harper, 2006b;Stone et al., 2013;Lindenmayer et al., 2017;Garland et al., 2017;Bezombes et al., 2019 Contradictions within permit requirements Program Quigley and Harper, 2006b Failure to consider landscape context Program Van den Bosch and Matthews, 2017 Illegal trespassing Program Hill et al., 2013 Insufficient offset ratios Program; project Goldberg and Reiss, 2016;Stone et al., 2013;Quigley and Harper, 2006a Invasive encroachment without management Program Van den Bosch and Matthews, 2017 Lack of additionality Project Thorn et al., 2018 Lack of contingency measures in case of offset failure Program May et al., 2017 Lack of data to demonstrate outcomes Program May et al., 2017;Stone et al., 2013;Bezombes et al., 2019 Lack of ecological equivalence Project Thorn et al., 2018;Teels et al., 2004 Lack of ecological suitability of creation offset site Program Hill et al., 2013;Kozich and Halvorsen, 2012;Quigley and Harper, 2006a;Shafer and Roberts, 2008 Lack of monitoring Program; project Quigley and Harper, 2006a;Quigley and Harper, 2006b Lack of offset expertise Program Quigley and Harper, 2006b Offset implementation failure Program Quigley and Harper, 2006b;May et al., 2017;Morgan and Roberts, 2003;Shafer and Roberts, 2008;Bezombes et al., 2019 Temporal lag Program; project Quigley and Harper, 2006a Unregulated impacts Landscape; Program Goldberg and Reiss, 2016;Carle, 2011 NNL/offset success Bringing impacts under regulation Landscape Fickas et al., 2016 High offset ratio Program; project Pickett et al., 2013;Robertson and Hayden, 2015;Harper and Quigley, 2005 Simple biodiversity metric Project Pickett et al., 2013 higher rates of ecological recovery than many other habitat types , and this recovery is more likely to reach reference conditions if the impacted wetland was itself degraded (relatively likely in areas undergoing development or construction). Second, the two main wetland offsetting policies covered by our dataset are Section 404 of the Clean Water Act in the USA and the Canadian policy of NNL in productive fish habitat. These rank among the oldest NNL policies, and both have undergone numerous refinements during their implementation (Hough & Robertson, 2009;Rubec & Hanson, 2009), thus, their effectiveness might exceed that of younger offset policies elsewhere.
An additional key reason for biodiversity offset success appears to be high offset ratios. This finding should be considered in the context of recent literature encouraging practitioners not to simply rely upon high multipliers to solve all offset implementation problems (Bull et al., 2017). However, within our database, high multipliers appear to be a predictor of NNL success. For individual species-based offsets, this may be because high offset multipliers can be a useful mechanism for increasing habitat availability for the offset species and thus easing density-dependence constraints within the re-establishing population (Pickett et al., 2013). For habitat-based offsets, high multipliers might promote the achievement of NNL if best-practice biodiversity metrics which account for both habitat extent and condition are used (Bezombes et al., 2018), although care must be taken to constrain trades between habitat condition types to avoid trading large extents of biodiversity-poor habitat for small extents of valuable habitat (Carver & Sullivan, 2017).
However, it is unclear to what degree these perceived predictors of success (wetlands and high multipliers) reflect true trends, or whether these reflect the choice of outcome variables used to assess NNL. At program scales, seven of 10 wetland studies where a NNL designation could be made found that NNL was achieved, but all studies used area as an outcome variable. At landscape scales, two of three wetland studies found that NNL was not achieved, and again all used area as their outcome variables. At project scales, nine of 21 offsets achieved NNL, yet for seven of these successes the outcome variables were community densities. Six of these successes came from Quigley and Harper (2006a), who calculated whether NNL was achieved for community density outcomes while accounting for the offset multiplier (i.e., to infer whether the overall abundance of the community group in question was higher for the offset than the impact site). Thus, these successful NNL outcomes are also linked inextricably to offset area. Therefore, with our current dataset we cannot definitively answer the question of whether true NNL in biodiversity is more likely for wetlands than other habitat types, because many of the current metrics used to assess NNL in the literature are confounding offset area (and the offset multiplier) with increases in biodiversity. This is problematic because habitat area alone does not necessarily reflect habitat quality or community composition (Dale & Gerlak, 2007), and is thus widely recognized as an unsatisfactory biodiversity metric (Quétier & Lavorel, 2011). Additionally, this review cannot indicate the direction of causality-projects with larger offset ratios might be more likely to be successful, but plausibly larger offset ratios might merely be more strongly embedded into older NNL policies.

Influence of spatial scale on NNL outcomes
The perceived discrepancy in outcomes between landscapescale and program-scale evaluations of NNL is likely because program-scale evaluations only account for registered offsets/impacts: yet unregulated or exempt impacts may well make the difference between achieving NNL or not (Maron et al., 2018). For example, in Florida between 2001 and 2011, mitigation banking restored 58,575 ha of wetlands, yet across the state, a net 5600 ha/year was lost during the same time period (Levrel et al., 2017), which is possibly because the Clean Water Act applies only to "jurisdictional wetlands," thus many wetland impacts escape regulation. Discrepancies between the apparent success of program-scale area-based evaluations and landscape-scale ones indicate that NNL policies are likely undermined if some impacts are unreported or otherwise exempt from regulation Reside et al., 2019). Thus, the scope of impacts falling underneath these policies should be widened to include all impacts and minimize opportunities to avoid NNL legislation.

Outcomes of individual biodiversity offsets
For individual offsets, the outcome variables used were more complex than merely habitat area, and generally adapted to the particular contexts of their associated NNL policies (e.g., Quigley & Harper, 2006a used indicators of habitat productivity, variables representing habitat quality and community densities, to assess whether offsets achieved their policy target of NNL of productive fish habitat). Notably, we found only three studies that attempted to assess whether offset and impact sites were ecologically equivalent at the community level. For offsetting to be demonstrably ecologically equivalent, it should capture aspects of species identity or community composition: two studies accounted for community composition by using indices of biotic integrity (Hegberg et al., 2010;Teels et al., 2004), and one by assessing whether habitat type, quality, and structure were similar to that at the impact site (Thorn et al., 2018). Given the strong emphasis in best-practice principles on achieving ecological equivalence (McKenney & Kiesecker, 2010;Quétier & Lavorel, 2011), the lack of empirical evaluations demonstrating equivalence is a clear gap.
Additionally, we found a number of methodological issues with offset studies, with three of eight studies conducting site-based ecological assessments of biodiversity losses and gains meeting our criteria for study internal validity. Alongside opportunities for selection bias and the measurement of biodiversity at a single timepoint that does therefore not account for ecological dynamics, the most common issue was the use of controls that are open to potential performance bias (Bilotta et al., 2014). Four studies used controls from ≥5 years before measuring biodiversity at the offset site, which can be justified on the grounds that development projects take years to be implemented, but it cannot be ruled out that other factors influenced changes in biodiversity over this time, thus obscuring the true impact of the NNL policy on biodiversity. Additionally, although not identified in these studies, evaluators should beware pseudoreplication when assessing whether NNL is achieved across multiple sites.
Combined, these points emphasise the need for higherquality evidence to understand when NNL is defensible as a conservation strategy. Our review identified just one study meeting our inclusion criteria that compared NNL outcomes with a robust counterfactual . Generally, the quality of impact evaluations for NNL appear to be lagging behind those applied in other areas of conservation and environmental policy, such as payments for ecosystem services (Pynegar, Jones, Gibbons, & Asquith, 2018), protected areas (Miteva, Pattanayak, & Ferraro, 2012), commodity sustainability certification (Carlson et al., 2018), and forest policy (Simmons et al., 2018). Recognizing that the true causal impact of conservation policies can be confounded by biases in those receiving conservation treatments, there is an increase in applications of experimental, quasi-experimental, and matching methods to improve our causal understanding of policy effectiveness (Ferraro & Hanauer, 2014). The first study of this kind assessing the effectiveness of NNL-related policies focused on species conservation banks (Sonter, Barnes, Matthews, & Maron, 2019), but most of these do not have NNL requirements, and to date, there remain no NNL evaluations using advanced causal inference. This is, therefore, a vital area of future research.
Biodiversity offsets receive disproportionate attention compared to the other stages of the MH (Hough & Robertson, 2009). However, the effectiveness of NNL is fundamentally reliant on robust implementation of avoidance and minimization measures (Phalan et al., 2018;von Hase & ten Kate, 2017). Our current understanding of the effectiveness of these stages is limited. The major difficulty in evaluating avoidance is that only part of the process of avoidance is observable: permit denials and evaluations of alternative impact sites common to major infrastructure projects. The evidence from these stages would imply that avoidance is weakly applied, as numerous studies have demonstrated low rates of project rejection on environmental grounds and weak justifications for why final project sites were chosen (Clare, Krogman, Foote, & Lemphers, 2011;Phalan et al., 2018). However, recent work from South Africa has found these observable characteristics to be imperfect reflections of the actual avoidance embedded in the planning process, as many decisions on avoidance happen through informal consultations with regulators in advance of project proposal (Sinclair, 2018).

Policy implications
Finally, are the findings of this review generalizable and of policy relevance? Our search language is a limitation, and while there is evidence that English captures most of the literature on NNL implementation globally , NNL systems in countries such as Germany or Brazil may not have been captured in our review. Furthermore, our sampling strategy is biased away from the gray literature. However, the direction of this bias is unclear (Theis et al., 2019)-plausible arguments could be made both for a selection bias toward publishing unsuccessful NNL results in the academic conservation literature, and toward not publishing unsuccessful results in the gray literature because of a fear of criticism for legislators or vested interests. Additionally, although our review was global, the evaluations of actual NNL outcomes identified in our review are biased toward high-income countries with strong institutions. Thus, it is possible that our review may overestimate the probability of achieving NNL outcomes in countries with weaker environmental legislation. However, strong institutions far from guarantee a successful NNL policy-details of NNL design are vitally important (Maron et al., 2018). Therefore, without overstating our findings, we feel there are generalizable recommendations that can be derived from our review: • policymakers should be aware that without significant improvements to existing policies, NNL policies in forested habitats or utilizing avoided loss offsets are unlikely to achieve NNL; • improving compliance with NNL policies is essential for achieving improved ecological outcomes (which may come from mandating some form of penalty for noncompliance); • and it is important to move beyond area-based outcome measures when implementing NNL.
With $60-70 trillion dollars committed to infrastructural expansion by 2030 (Laurance et al., 2015), it is essential that we develop solutions that fully address the unmitigated biodiversity impacts of infrastructural expansion. If we are to achieve NNL of biodiversity, it is an urgent priority to develop the evidence base to understand what works, and when.