Implications of Goodhart's Law for monitoring global biodiversity loss
Adrian C. Newton, Centre for Conservation Ecology and Environmental Science, School of Applied Science, Bournemouth University, Talbot Campus, Poole, Dorset BH12 5BB, UK. Tel: +44 (0)1202 965670. E-mail: firstname.lastname@example.org
Increasing efforts have recently focused on development of indicators for monitoring biodiversity loss, stimulated by development of the “2010 target.” Such efforts have failed to consider Goodhart's Law, which states that once an indicator is made into a policy target, then it will lose the information content that qualifies it to play its role as an indicator. The implications of Goodhart's Law for monitoring biodiversity are examined with specific reference to the IUCN Red List Index (RLI). According to Goodhart's Law, use of the RLI as an indicator could affect how conservation actions are targeted and how the Red List assessment is conducted, potentially undermining the assessment process itself. The use of targets in conservation policy and the associated development of indicators should therefore be undertaken with caution. Specifically, to support monitoring of global biodiversity loss, systems should be put in place to prevent the manipulation of indicators and the assessments on which they are based, to ensure that the information they provide is objective and reliable.
In 2002, parties to the Convention on Biological Diversity (CBD) committed to “achieve by 2010 a significant reduction of the current rate of biodiversity loss” (Decision VI/26, Secretariat of the Convention on Biological Diversity 2003). This “2010 target” was also incorporated into the United Nations Millennium Development Goals (UN 2008) and has subsequently been reflected in European policy (Mace & Baillie 2007). Monitoring of progress toward this target has stimulated the development and implementation of appropriate indicators, an issue to which substantive efforts have recently been devoted (Mace & Baillie 2007; Jones et al. 2011).
The development of indicators for monitoring biodiversity loss represents a significant research challenge (Balmford et al. 2005), as available data are often biased, unrepresentative, and incomplete. Mace and Baillie (2007) note that the systematic development, testing, and validation of appropriate indicators are still at an early stage, and consequently little information is available on the reliability, relevance, and cost-effectiveness of such indicators. Similarly, Walpole et al. (2009) highlight the problems of data availability, consistency, relevance, and uneven taxonomic and geographic coverage, even for those global indicators that are relatively well developed. In addition, some indicators are weak proxies for biodiversity, because of the reliance on information collected for purposes other than monitoring biodiversity loss. According to Walpole et al. (2009), the most well-developed direct measures of biodiversity are species indicators, such as the IUCN Red List Index (RLI).
The RLI has been developed by the IUCN and partners, explicitly to monitor biodiversity loss. The Index is based on the IUCN Red List of Threatened Species™, which is widely considered to be the leading assessment of the extinction risk of species. The Red List involves the application of quantitative criteria based on population size, distribution area, and rate of decline, to assign species to different categories of relative extinction risk (IUCN 2001). The RLI uses information from the Red List to assess changes in extinction risk of groups of taxa. Details of how the RLI is calculated are presented by Butchart et al. (2007). Essentially, the number of taxa in each Red List category is multiplied by a weight for each category, then these values are summed and expressed as a value relative to all taxa being classified as extinct. RLI values therefore vary between 0 and 1, referring to the situations where all taxa are classified as Extinct or as Least Concern, respectively. Trends in the RLI over time are determined from the number of species changing Red List category between assessments owing solely to improvements or deteriorations in status (i.e., excluding revisions owing to revised taxonomy, improved knowledge, or revised criteria). Trends in the RLI can only be calculated when all species in a group have been assessed at least twice (Butchart et al. 2007).
Calculation of the RLI has been undertaken for a variety of different groups of species, including birds and amphibians (Butchart et al. 2005b, 2007). The RLI was also employed by Butchart et al. (2010) as one of 31 indicators for assessing global progress toward the “2010 target.” In common with a number of other indicators, the RLI (calculated for mammals, birds, amphibians, and corals) displayed negative trends, leading the authors to conclude that at the global scale, it is highly unlikely that the “2010 target” has been met. Stuart et al. (2010) suggest that further development of the Red List would provide a “solid basis” for monitoring biodiversity trends, through calculation of the RLI both at national and global scales. However, use of the Red List as a biodiversity indicator has previously been the subject of some debate. For example, Possingham et al. (2002) suggested that threatened species lists have limited value as indicators of environmental change because of uneven taxonomic coverage, variation in observational effort, and the fact that the changes in the lists often reflect change in knowledge of status rather than change in the status itself. This latter point was demonstrated by Quayle and Ramsay (2005) in their analysis of vertebrate species included in the British Columbia Red List. It is also recognized that the Red Listing process is often uncertain (Akçakaya et al. 2000; Newton 2010) and may even be inaccurate (Godfrey & Godley 2008, Webb 2008). However, other authors have emphasized the value of the Red List as a tool for monitoring biodiversity loss (e.g., see Lamoreux et al. 2003; Rodrigues et al. 2006), and potentially, the RLI approach can overcome some of the limitations identified previously (Butchart et al. 2005a, b).
Irrespective of the limitations of the IUCN Red List, there is another reason for caution when using it to monitor progress toward policy targets: Goodhart's Law. This article first describes Goodhart's Law, then examines its implications for developing indicators of biodiversity loss, with specific reference to the RLI. While the RLI is used here to explore the possible implications of Goodhart's Law, it should be emphasized that the points made here potentially apply to any other biodiversity indicator used in a similar way.
Professor Charles Goodhart was Chief Adviser to the Bank of England in the 1970s. He stated his Law as follows:
“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes” (Goodhart 1984).
Goodhart's Law has since been restated more generally as follows:
“When a measure becomes a target, it ceases to be a good measure” (Hoskin 1996).
Goodhart's Law was originally developed in the context of conducting monetary policy on the basis of targets (Goodhart 1984). It refers to the observation that previously estimated relationships (specifically between the nominal interest rate and the nominal money stock) had broken down. The “Law” states that this is inevitable when policy makers use statistically estimated relationships as the basis for policy targets (Chrystal & Mizen 2003). While it has been expressed in a variety of ways, the essence of Goodhart's Law is that once an indicator or other surrogate measure is made into a policy target, then it will lose the information content that would qualify it to play its role as an indicator.
Goodhart's Law has been highly influential in the field of monetary policy and economics, in which it was developed (Chrystal & Mizen 2003; Evans 1985; Gotz-Kozierkiewicz 2010, Issing 1997; Takeda & Ueda 2006). However, it has profound implications for the selection of high-level policy targets in any context (Hoskin 1996). The Law has also been examined in the fields of psychiatry (Robinson 2002) and healthcare (Price et al. 2010), but its application can be observed in many other areas. Examples include the manipulation of waiting lists by hospitals to improve their apparent performance (Buchanan & Storey 2010); amendment of teaching strategies toward “teaching to the test,” in response to use of league tables to monitor performance of educational institutions (Goldstein 2004); and manipulation of GDP by national governments (Nye & Moul 2007). These examples capture the mechanism underpinning Goodhart's Law: if their performance is being monitored, people tend to alter their behavior accordingly.
Although Goodhart's Law has been little explored previously in relation to biodiversity conservation, recent policy initiatives illustrate its possible application. For example, in the United Kingdom, rapid declines in the abundance of farmland birds led to their inclusion in the UK's suite of indicators of sustainable development (Vickery et al. 2004). As a direct result, a substantial research effort was subsequently devoted to examining the causes of decline in farmland birds. This has led to changes in farming practice, which have had some success in slowing or even reversing the observed declines (Robinson 2010). While this can be viewed as a conservation success story, it has also potentially undermined the use of farmland birds as a sustainability indicator. As policy and management interventions have focused specifically on birds, with the aim of improving this indicator, bird abundance is arguably now less representative of the general state of the natural environment than it was hitherto. Consequently, any increase in the indicator is more likely to be a measure of specific response actions than of any general improvement in the state of the environment.
This highlights the problem encapsulated by Goodhart's Law. To be effective, biodiversity indicators should be correlated with some environmental measure (Niemi & McDonald 2004). However, once a measure is declared as important and policy aims to reduce it, the underlying correlation will be reduced. This is because people will tend to affect the statistic in whichever ways can be most readily achieved. As a result, indicators become decoupled from underlying process that they are supposed to indicate, and indicator values will become artificially inflated without addressing the underlying problem.
Goodhart's Law and the IUCN RLI
What are the implications of Goodhart's Law in the context of using the RLI to monitor global biodiversity loss? I propose that there are main two potential outcomes. First, it might affect how conservation actions are targeted. Second, it could affect how the Red List assessment process is conducted.
It is currently being proposed that the RLI should be used as a national-scale indicator for reporting to the CBD (Stuart et al. 2010; Walpole et al. 2009), presumably in combination with other indicators, as presented by Butchart et al. (2010). If this were implemented, individual countries might seek to improve their RLI scores. This could be achieved by targeting their conservation actions accordingly. The RLI is calculated by weighting different categories of threat (i.e., Extinct = 5, Critically Endangered = 4, Endangered = 3, Vulnerable = 2, Near threatened = 1, Least Concern = 0; Butchart et al. 2007). Those actions that are likely to result in greatest change in value of the RLI are likely to receive greater priority in terms of conservation action. How might this translate into practice? One example might be to target species with high categories of threat and hence high weight values, for example, through programs to reintroduce species that have become nationally rare or even extinct. In the United Kingdom, successful recent examples of such reintroductions include red kite and white-tailed eagle (Robinson 2010). Collectively such initiatives could have a significant impact on overall RLI index scores. Potentially, use of the RLI could lead to such interventions being preferred over more wide-ranging interventions, such as changing land use patterns, which might benefit a much larger suite of species, including those that are not currently threatened. Furthermore, actions aimed at improving the trends in RLI might conflict with alternative approaches to conservation priority setting, such as those based on cost effectiveness (Joseph et al. 2008).
RLI scores could also be improved by focusing conservation actions on those species in which a change in listing is easiest to achieve, rather than those that are most threatened or are otherwise most deserving of conservation prioritization. The precise choice of actions would depend on the criteria under which the taxon was listed. During Red List assessments, species are assigned to one of a series of categories by applying five quantitative criteria, largely based on population size and geographic range size (IUCN 2001). Taxa that meet the appropriate threshold for at least one of the five criteria may be assigned to one of the threatened categories (IUCN 2001). A change in the status of a taxon listed under criterion D, for example, might be improved by introducing ex situ individuals into the wild and ensuring that they breed successfully, as this criterion relates to taxa with a very small or restricted population size. In reality, many taxa are listed under more than one criterion. In such situations, improvement of an RLI score would depend on the relative influence of different criteria on the overall categorization.
Use of the RLI as an indicator could also affect the Red List assessment process itself. At present, Red List assessments at the national scale are typically undertaken by national Red List authorities established for the purpose, who may have the power to decide which groups of species are included in future assessments. Use of RLI as an indicator might act as a disincentive to conducting further RL assessments on specific groups in order to avoid negative impacts on scores. Alternatively, RLI scores could be improved by prioritizing particular groups of taxa for future assessments, for example those that are believed to be at low risk of extinction. The uncertainty that is an inherent part of the Red Listing process, which includes the explicit use of inference, also makes it susceptible to manipulation or even abuse. The potential for this is illustrated by the uncertainty surrounding some assessments such as those of sea turtles (Godfrey & Godley 2008; Webb 2008), and the phenomenon of taxonomic inflation (Isaac et al. 2004), which could lead to increased numbers of taxa featuring on threatened species lists. While it is recognized that IUCN has established an oversight and peer-review process in order to ensure that standards are met, there is nevertheless scope for manipulating the assessments that are conducted, particularly at the national scale.
An example of the RLI influencing how the Red List assessment is conducted is provided by the recent IUCN Sampled Red List Index for Plants (Brummitt & Bachman 2010), for which 1,500 plant species from five major plant groups were randomly selected “as representative of the other flowering plants.” This assessment was conducted following the suggestions made by Baillie et al. (2008), who proposed a Sampled RLI (SLRI) approach for large, specious taxonomic groups such as plants, to support monitoring progress toward the 2010 target. This contrasts with the more comprehensive assessments of the bird, mammal, amphibian, and coral RLIs described by Butchart et al. (2010). According to Goodhart's Law, there is a risk that use of the SLRI to monitor global biodiversity will result in conservation efforts focusing on those species that are included in the sample, which would undermine the degree to which they are representative of other taxa. This highlights a potential risk of implementing the SRLI approach (Baillie et al. 2008).
The “2010 target” represents an important milestone in development of global policy relating to biodiversity conservation. Understandably, it has been used by the conservation science community as justification for developing indicators of biodiversity loss. Some indicators, such as the RLI, are being actively promoted in this context (Stuart et al. 2010). The implication of Goodhart's Law is that this should be undertaken with caution. Use of the RLI to monitor progress toward policy targets could potentially undermine the Red List assessment process itself. This risk has not been considered previously, but the evidence from other sectors is clear: Goodhart's Law is not an abstract theory, but a consequence of human nature. In addition, the risks do not apply solely to the RLI, but potentially to any indicator used to monitor biodiversity, including all of those used by Butchart et al. (2010).
What are the implications of Goodhart's Law for monitoring global biodiversity loss? At the 10th meeting of the CBD Conference of the Parties (COP 10) held in Nagoya, Japan in October 20, 2010, new targets were identified as part of the CBD Strategic Plan leading up to 2020 (Normile 2010). The CBD therefore appears to be committed to the principle of target-setting as a component of international conservation policy, despite the risks inherent in Goodhart's Law. If such global targets are to be set, it should be recognized that this will have implications for any indicators used to monitor progress toward such targets. Specifically, systems should be put in place to prevent manipulation of the indicators and the assessments on which they are based, to ensure that the information they provide is objective and reliable. Use of multiple indicator sets, including measures of pressure as well as state variables, could help reduce scope for indicator manipulation. There is arguably a need for an independent authority to manage the biodiversity assessment and reporting process, to ensure that standards are met. Such an authority would need to be adequately resourced. Potentially, the proposed Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (Larigauderie & Mooney 2010) could fulfill this role. In addition, there is a clear need for a comprehensive and systematic monitoring program for biodiversity (Pereira & Cooper 2006), which should be open, transparent, and independent from policy reporting processes. Finally, there is a need for caution in interpreting the information provided by any indicators that are used; at best, they can only provide a partial indication of the status and trends of global biodiversity, and this needs to be appreciated by the decision-makers who employ them.
Thanks to the four referees whose comments helped improve the text.