• Open Access

Opportunities for improving the rigor of management effectiveness evaluations in protected areas


  • Editor Prof. Atte Moilanen

Carly N. Cook, School of Biological Sciences, University of Queensland, St. Lucia, Queensland 4072, Australia. Tel: +61-7-3365-4723; Fax: +61-7-3365-6019. E-mail: cook@uq.edu.au


Management effectiveness evaluations are an increasingly common approach to measuring conservation outcomes within protected areas. While these evaluations have the potential to provide valuable data to guide management, the accuracy of evaluation data is not reported. We investigated how evaluation data are collected and used, the criticisms made of evaluation methods, and the processes employed to address criticisms or ensure accuracy. We found that most evaluation tools use qualitative indicators of management success and rely heavily on the knowledge of protected area managers. Criticisms of the evaluation methods have led to improvements to the data collection process, but the precision and accuracy of these data are almost never measured. We believe that measuring the accuracy of evaluation data will provide important opportunities to improve the rigor of future evaluations and build confidence in the use of these data for adapting protected area management.


Protected areas are described as the cornerstone of conservation, but despite an increasing investment in protected areas (Jenkins & Joppa 2009), biodiversity continues to decline on a global scale (Butchart et al. 2010). The apparent mismatch between conservation activity and achieving desired conservation outcomes has highlighted the need to evaluate the effectiveness of protected area management, so this investment is not being squandered (Ervin 2003). Management effectiveness evaluation (MEE) involves more than just monitoring the status of biodiversity within protected areas (Hockings 2003) but also attempts to understand the relationship between management actions and ecosystem condition (Timko & Innes 2009). The first step in any monitoring or evaluation program should always be to set appropriate objectives, and MEEs examine existing management objectives and practices to identify where changes or improvements may be needed (Hockings et al. 2006), much like the management strategy evaluation approach to ecosystem-based management (Smith et al. 2007).

The potential for MEEs to make a valuable contribution to conservation is being increasingly recognized within the global conservation community. The Convention on Biological Diversity (http://www.cbd.int) now calls on signatory countries to evaluate management effectiveness in 60% of their protected areas by 2015 (CBD 2010) and the United Nations use MEEs as an indicator of biodiversity conservation (Walpole et al. 2009). To assist those wanting to develop MEE tools, the World Commission on Protected Areas (WCPA) produced a guiding framework (Hockings et al. 2006). The framework acknowledges that no single design will suit all protected areas but that there are basic elements that can form the core of all evaluations. Six elements of management effectiveness are described in the framework (Figure 1): (1) reserve context, (2) management planning, (3) inputs, (4) management processes, (5) outputs, and (6) outcomes. Generally, qualitative indicators are well suited to evaluating the context (e.g., key values of the protected area), planning (e.g., appropriate and clearly stated management objectives), and management processes (e.g., the implementation of identified management actions) within protected areas. Quantitative indicators are more appropriate for evaluating the inputs (e.g., dollar spent on management), outputs (e.g., area of herbicide application), and outcomes (e.g., increase in a threatened population) of management (Hockings et al. 2009b).

Figure 1.

The six elements of the World Commission on Protected Areas framework for designing management effectiveness evaluations, represented as a cycle with evaluation at the core. Adapted from Hockings et al. (2006).

In the absence of quantitative data, expert opinion is the best information available about protected area management, so it is common for MEEs to rely heavily on the knowledge of protected area managers and stakeholders to conduct evaluations (Leverington et al. 2008). Even MEEs that rely predominantly on quantitative indicators of effectiveness will fill knowledge gaps using expert opinion (Timko & Innes 2009) when monitoring programs cannot provide these data. Expert opinion is commonly used in ecology where data are scarce (Kuhnert et al. 2010) because it assists in framing and informing conservation decisions (Cowling et al. 2003), is cost effective and readily available for urgent decisions (Lele & Allen 2006), can provide data not easily collected through quantitative methods (Hockings et al. 2009b), and can be used to synthesize and interpret existing quantitative data (Fazey et al. 2006). However, data derived from expert opinion can be susceptible to bias (Burgman 2001) and can give the impression of not being rigorous, regardless of its accuracy (McCarthy et al. 2004). While understanding the effectiveness of protected area management is important for achieving conservation outcomes, inaccurate or misleading evaluation data might be worse than no data (Walker et al. 2009), especially if it misdirects scarce resources or leads to poor decisions. The accuracy of expert opinion can vary (e.g., Johnson & Gillingham 2004; Martin et al. 2005), so it is important to understand the accuracy of management effectiveness data before we can be confident that these data will improve conservation outcomes.

A recent study of MEEs found evaluations in 6,200 protected areas from over 100 countries (Leverington et al. 2010), representing approximately 4.5% of protected areas globally. Despite the increasingly widespread use of MEEs in protected area management and the diversity of methods being used, no published studies measure the accuracy of evaluation data. MEEs rarely appear in the peer-reviewed literature (e.g., Goodman 2003; Timko & Innes 2009), with data either not being made public or results being summarized in the gray literature. Estimates of the accuracy of MEEs may likewise exist outside the published literature. The limited availability of MEEs inhibits the potential for those conducting similar evaluations to learn from the experience of others and improve the rigor of future evaluations. Therefore, we interviewed experts from around the world experienced in the development and implementation of MEEs to investigate opportunities to improve the rigor of future evaluations. We were interested in: (1) how the evaluation data were collected and used; (2) what criticisms had been made of MEEs by members of the conservation community; (3) whether any measures were taken to promote accuracy within evaluation data; and (4) whether the accuracy of MEE data had been measured. We discuss the experiences of evaluators in relation to what can be learnt about the conduct of future evaluations of protected area management.


Key informant interviews

Given the research questions, it was important to interview experts with an intimate knowledge of: (1) the design and implementation of at least one evaluation tool, (2) how evaluation data are used, and (3) the attitudes of the conservation community (i.e., protected area managers, academics, stakeholders, and nongovernment organizations) to MEE data. Therefore, we used a purposive sampling approach (Miles & Huberman 1994) to select key informants. Based on the data presented by Leverington et al. (2008), we targeted informants with more than 10 years experience, who were knowledgeable about evaluation methods representing: (1) more than one country, (2) evaluations of more than 25 protected areas, and/or (3) uncommon approaches to data collection (e.g., quantitative evaluations) (Table 1). While the selection criteria limited the sample to 13 informants (Table 1), because each evaluation tool is implemented according to a standardized methodology, the combined knowledge of these 13 informants represents evaluations from 4,692 protected areas in over 100 countries from all continents (Table 2), 76% of the protected areas known to have been evaluated globally (see Leverington et al. 2010). Our sample includes both internationally applied MEE tools and evaluation methods developed for specific geographic regions (Table 2).

Table 1.  The relevant expertise of the key informants interviewed
InformantAcademic qualificationExperience) (years)Employed byManagement effectiveness evaluation methodInvolvementRegions
1PhD>20Management agencyManagement Effectiveness StudyDevelopment and implementationEurope
2PhD>20Management agencyEcological Integrity MonitoringDevelopment and implementationNorth America
3PhD>10Nongovernment organizationManagement Effectiveness Tracking ToolImplementationAfrica
4Masters>10Nongovernment organizationManagement Effectiveness Tracking ToolImplementationAll
Rapid Assessment and Prioritization of Protected Area ManagementImplementationAll
5Bachelor>10Management agencyState of the Parks (New Zealand)Development and implementationOceania
6PhD>10Management agencySystem of Information, Monitoring and Evaluation for ConservationDevelopment and implementationLatin America
7Bachelor>10Management agencySite-Level AssessmentDevelopment and implementationAfrica
8Bachelor>10Nongovernment organizationSite Tracking ToolDevelopment and implementationLatin America, Africa, Asia, Indo-Pacific
9Bachelor>10Nongovernment organizationEnhancing Our HeritageDevelopment and implementationAfrica
Management Effectiveness Tracking ToolImplementationAfrica
Rapid Assessment and Prioritization of Protected Area ManagementImplementationAfrica
10PhD>20Government research agencyEnhancing Our HeritageDevelopment and implementationAsia
Indian Management Effectiveness StudyDevelopment and implementationAsia
11PhD>10Nongovernment organizationMonitoring and Assessment with Relevant Indicators of Protected Areas of the GuianasDevelopment and implementationSouth America
Programa Ambiental Regional para CentroamericaDevelopment and implementationCentral America
Rapid Assessment and Prioritization of Protected Area ManagementImplementationLatin America and the Caribbean
12PhD>20UniversityEnhancing Our HeritageDevelopment and implementationAsia, Africa, and Latin America
State of the Parks (Australia)Development and implementationOceania
State of the Parks (Korea)Development and implementationAsia
Management Effectiveness Tracking ToolDevelopment and implementationAll
13Bachelor with honors>15ConsultantManagement Effectiveness Tracking ToolDevelopment and implementationAll
Enhancing Our HeritageDevelopment and implementationAsia, Africa, and Latin America
Nature Parks Quality CampaignDevelopmentEurope
Table 2.  The regions and coverage of management effectiveness evaluation methods represented in this study
Management effectiveness evaluation methodCountries/regions*Number of protected areas*
  1. *Information from Leverington et al. (2008, 2010).

International methods
  Enhancing Our Heritage9 countries in Asia, Africa, and the Americas9
  Management Effectiveness Tracking Tool86 countries in Africa, Asia, Europe, and the Americas1,150
  Rapid Assessment and Prioritization of Protected Area Management49 countries in Africa, Asia, Caribbean, Europe, and the Americas>1,600
  Site Tracking Tool22 countries in Africa, Asia, Pacific, and the Americas>130
  Site-Level AssessmentEgypt27
  Indian Management Effectiveness StudyIndia>58
  State of the ParksKorea39
  Management Effectiveness StudyFinland70
  Nature Parks Quality CampaignGermany64
Central and South America
  Monitoring and Assessment with Relevant Indicators of Protected Areas of the GuianasGuyana, Surinam, French Guyana>10
  Programa Ambiental Regional para Centroamerica6 countries in Central America156
  System of Information, Monitoring, and Evaluation for ConservationMexico143
  State of the ParksAustralia>1,180
  State of the ParksNew Zealand14
North America
  Ecological Integrity MonitoringCanada42

Twelve of the 13 informants participated in face-to-face, semistructured interviews about the evaluation tools they were familiar with; standard prompts were used to encourage informants to provide additional information where necessary (Table 3). The 13th informant could not be interviewed but provided written responses to the interview questions.

Table 3.  The interview protocol. Where informants gave short or uninformative answers they were asked “could you tell me a little bit more about that?”
What management effectiveness approach(es) have you been involved with?
Which organizations were involved in the development and implementation of this method?
In which countries has this method been used to evaluate management effectiveness and for how many protected areas?
What is the balance between qualitative and quantitative indicators?
What was the purpose of the evaluation?
How many years has the program been running?
What measures are used to try and promote accuracy and consistency?
Has anyone ever raised concerns about quality of the information produced by the technique?
Has anyone ever tried to measure the accuracy of the data produced by these assessments?

Statistical tests were not appropriate to investigate differences between evaluation methods and regions because of the nature of the sample—some evaluation tools were described by more than one informant, and some informants described more than one evaluation tool. Where multiple informants described the same evaluation tool their responses were pooled. There were no cases where informants provided contradictory information about the same tool. As an alternative to statistical tests, we used qualitative analytical methods, using categories to code similar responses (Patton 2002). We present results as either the percentage (%) or number (n) of the 15 evaluation methods described. Where relevant, we discuss the observed patterns in relation to the broad evaluation approaches and the regions where the tools are used.


Collection and use of evaluation data

Nine of the 15 evaluation methods described included indicators from all six elements of the WCPA's MEE framework (see Hockings et al. 2006; Table 4). Evaluations were described as based on largely qualitative (n= 12) or largely quantitative (n= 3) indicators of management for all, or most, of the six elements of the framework (Table 4). While informants described indicators as qualitative (e.g., the information to guide management is poor, fair, good, or very good) or quantitative (e.g., number of occupied nest sites), they were differentiated by how the indicators were answered (Table 4).

Table 4.  The type of indicators used within the evaluation methods, grouped according to the elements of the management effectiveness framework (see Hockings et al. 2006). Qualitative indicators are separated into those based on personal judgment, possibly after reference to supporting documentation, such as planning documents or empirical data, when it is available (+) and those that included an option to also provide a numeric value such as the number of planning documents (++). Quantitative indicators are separated into those populated by data from monitoring programs (++++) or those predominantly answered according to data from monitoring programs but using expert opinion if data are not yet available (+++)
WCPA framework elementReserve contextManagement planningInputsManagement processesOutputsOutcomes
  1. Na = not assessed;

  2. += purely qualitative indicators (but might make reference to available information);

  3. ++= largely qualitative indicators (but may include some numeric indicators—-for example, number of management plans);

  4. +++= largely quantitative (but may rely on expert opinion when data are insufficient);

  5. ++++= purely quantitative.

Evaluation method
 Ecological Integrity MonitoringNaNaNa+Na+++
 Enhancing Our Heritage+++++++++++
 Indian Management Effectiveness Study+++++++
 Management Effectiveness Study+++++++
 Management Effectiveness Tracking Tool++++++
 Monitoring and Assessment with Relevant Indicators of Protected Areas of the Guianas++++Na+
 Nature Parks Quality Campaign++++NaNa
 Programa Ambiental Regional para Centroamerica++++Na+
 Rapid Assessment and Prioritization of Protected Area Management++Na++Na
 Site-Level Assessment++++++++
 Site Tracking Tool+++++++
 State of the Parks (Australia)+++++++
 State of the Parks (New Zealand)NaNaNa++++++
 State of Parks (Korea)++++++
 System of Information, Monitoring, and Evaluation for Conservation++++++

The evaluations were generally self-assessments completed by protected area managers (n= 13). Sometimes these assessments were completed during workshops with managers (n= 8), often including representatives from stakeholder groups (n= 5). Alternatively, evaluations were completed by just the site-level manager(s) (n= 7), then “peer reviewed” by other managers (n= 1) or by an expert panel comprising managers and scientists (n= 1). Most assessment tools were reported to rely heavily on the personal experience of managers, but 11 evaluation tools included a process to ensure any relevant information was reviewed by assessors prior to completing the assessments (Figure 2). While the availability of supporting information will vary between protected areas, the lack of quantitative data is a frequently cited reason for qualitative self-assessment by protected area managers (Hockings & Phillips 1999), and data scarcity was a common theme amongst the informants.

Figure 2.

The techniques used to improve the consistency of evaluation data as described by key informants.

As an alternative to qualitative, self-assessment tools, a few MEE methods (n= 3 or 1% of the protected areas addressed by our sample) focus on quantitative indicators of management outcomes (e.g., stream quality index greater than five) derived from specially designed monitoring programs. With the exception of the Enhancing Our Heritage tool, the quantitative MEE methods are used in developed countries and collect data for only a subset of the WCPA framework elements (Table 4). Despite being linked to quantitative monitoring programs, data are not always available to address quantitative indicators of MEEs because of funding or time constraints. In these cases, indicators were reportedly left incomplete or answered using expert opinion (e.g., 10–15% of indicators; Timko & Innes 2009).

Informants described many purposes for evaluation data. Most commonly, management agencies aimed to: collect data that would improve management outcomes (93%), indicate conservation outcomes to stakeholders and financial donors (60%), guide management priorities and resource allocation (47%), and build community support for protected area management (47%).

Criticisms of assessment methods and measures used to promote accuracy and consistency

Informants reported that criticisms of MEE methods had been made by academics, members of nongovernment organizations and protected area managers, and that the most common concerns raised related to the indicators used and the confidence associated with the resulting data (Figure 3). Quantitative MEE tools were not immune to criticism, with two informants reporting that the nature of the indicators used were questioned, along with whether the empirical data collection methods would accurately reflect conservation outcomes. Despite the prevalence of self-assessment tools and the potential for assessors to manipulate evaluation results because of motivational biases (see Miller & Ross 1975), only one informant reported a criticism that the assessment process may be open to abuse. The informant noted that critics from academia and NGOs assumed that assessors might inflate the success of management if they were concerned that poor results would reflect badly on their job performance, while protected area managers assumed that assessors may understate conservation successes to attract additional resources for management. Both criticisms may be valid; however, it would be necessary to validate evaluation results to determine if assessors were introducing systematic bias into evaluation data. The poor performance reported by MEEs globally (22% of protected area report sound management; Leverington et al. 2010) suggest assessors are not inflating their success.

Figure 3.

The criticisms informants reported were made of management effectiveness evaluations. Black bars indicate criticisms made by academics, nongovernment organizations, and protected area managers. Gray bars indicate criticisms made by nongovernmental organizations. White bars indicate criticisms made by academics and nongovernment organizations (n= 15).

Informants recognized the susceptibility of both qualitative and quantitative MEEs to error and reported many ways that evaluation processes had been adapted to respond to these criticisms (Figure 2). Up to five different measures per tool were employed to improve the accuracy of the evaluation data and build confidence in the evaluation results (Figure 2). Most commonly, assessors reviewed existing quantitative data prior to the evaluation (73%).

Providing assessors with training (n= 6) and conducting evaluations using a workshop format (n= 8) were used to allow assessors to discuss their personal interpretations and share knowledge about the protected area (Figure 2). While allowing experts to discuss the assumptions that underpin their beliefs can reduce the linguistic uncertainty in risk assessments (Carey & Burgman 2008), we found no evidence that similar tests have been conducted for MEEs. Less commonly, MEE tools include independent experts in the evaluation process (n= 4; Figure 2). Informants reported that external experts were used to bring objectivity to evaluation processes to avoid the perception that protected area managers may misrepresent management outcomes when they complete self-assessments (Figure 3).

Measuring the accuracy of evaluation data

Only one attempt to estimate the accuracy of MEEs was reported (State of the Parks, Australia), despite all informants acknowledging that it would be desirable (n= 13). The results of the validation study have not been published, but a sample of qualitative self-assessments is being validated using quantitative data, and the precision of assessments made by different managers is being estimated. When discussing the need to validate MEEs, some informants described concerns about the difficulty of validating qualitative evaluation data (n= 2) and that validation may divert resources away from data collection (n= 2).

As an alternative to formal validation of MEEs, informants reported that the strengths and weaknesses of the evaluation process were reviewed after the initial implementation phase (n= 8; Figure 2). These reviews focused on refining qualitative assessment criteria and/or data analyses. Quantitative evaluation methods generally used a peer-review process to ensure rigorous data collection methods (n= 2; Figure 2). While theoretically sound, we found no tests of whether the measures described in Figure 2 had improved the quality of evaluation data.


We found that most MEEs were qualitative self-assessments of management performance, judged by protected area managers, sometimes including stakeholders in the process. While a small number of MEE tools are based on empirical monitoring of management outcomes, the lack of quantitative data and high costs of data collection were commonly cited reasons for designing evaluation methods around management experience. All approaches to MEEs have strengths and weaknesses. We found that there were always some criticisms of evaluation methods and informants have developed many approaches to strengthen the rigor of the data collection methods. The wealth of experience of those who have developed and implemented MEEs is an important resource that provides an opportunity to improve the rigor of future evaluations.

The major criticisms made of qualitative evaluation methods were that expert opinion can be prone to bias (Burgman 2001) with consequences for data quality. Consensus-building techniques, such as data collection workshops, were commonly used to address concerns about rigor in qualitative evaluation tools because these techniques make assumptions explicit (see Ziglio 1996) and reduce the error associated with the variable interpretation of language (Carey & Burgman 2008). Providing training to assessors prior to the assessment provides a further opportunity to clarify assumptions and standardizes the interpretation of qualitative indicators. Bringing assessors together for workshop and training is costly, but it is likely to not only improve the quality of evaluation data but also yield benefits through allowing managers to share knowledge.

To assist managers evaluate their protected areas, all of the qualitative MEEs ask them to review existing data prior to the evaluation. Assimilating multiple sources of information, including personal knowledge derived from the day-to-day management of the protected area, is a practical solution to basing evaluations on the best available knowledge while ensuring that the completion of an evaluation does not rely on the presence of data. However, it is unclear whether this process improves the accuracy of evaluation data, especially considering that even in a relatively well-resourced protected area network, quantitative data are only used to inform 10% of evaluations (Cook et al. 2010).

Self-assessment methods suffer from criticisms about the independence of the data. We found that employing external experts to participate in the evaluation process is increasingly being used to build independence into the process. The cost involved in using independent experts limits the number of protected areas can be evaluated (i.e., a maximum of 70 protected areas; Table 2) but may be justified for small protected area networks or where only key protected areas are being evaluated.

While there may be concerns about the rigor and independence of self-assessments methods, there are also real benefits to these approaches. Setting clear management objectives is vital to effective management (Possingham 2001) and to evaluating the success of management activities. Involving protected area managers in the evaluation process demonstrates the importance of setting clear objectives, which will ultimately benefit the day-to-day management of the protected area. Another important benefit of involving managers in the evaluation is that the evaluation data are more likely to be used to improve management (Patton 2008). Qualitative indicators based on personal experience may also be able to provide information that purely quantitative methods cannot. While setting thresholds for quantitative indicators can indicate whether a target has been reached, understanding why or how may only be achieved if coupled with contextual information. For example, an increase in weeds in a reserve may not be the result of poor management but thousands of weed propagules being deposited after a recent flood. Therefore, the input of protected area managers in evaluations to capture context will be valuable even for quantitative MEE tools.

While there are many benefits to self-assessment methods and a great deal of effort has been invested in improving the rigor of evaluation methods, it is unclear whether any of these approaches lead to more accurate evaluation data. The majority of informants had not evaluated the accuracy of evaluation data but only reviewed and revised the tools in an attempt to learn from past mistakes. While learning and so improving the evaluation tool is likely to be an ongoing process, there is a danger that changing the indicators too frequently will prevent any analysis of trends in the data. Despite the value of reviewing the evaluation process (Hockings et al. 2009a), without validating the accuracy of the assessments, criticisms of the rigor of evaluation data are likely to remain.

Of the 15 evaluation methods we documented in this study, which represent approximately 80% of the protected area known to have conducted MEEs (see Leverington et al. 2010), we found only one attempt to measure the accuracy of evaluation data. All of the informants we interviewed acknowledged the benefits of validating evaluation results but measuring the accuracy of personal opinion is challenging (Burgman 2001) and there are costs associated with validating evaluation data. However, failing to validate management effectiveness data could prove to be a false economy if evaluations are inaccurate. Given the importance of understanding the effectiveness of protected area management, and the fact that most evaluations were reportedly being used to “improve management,” we believe it is important to both conduct MEEs and measure the accuracy of the data they generate.

The knowledge of the informants we interviewed provides a valuable opportunity to learn from their considerable experience. Many evaluation tools are based on theoretically sound methods adapted to realistic constraints on the availability of quantitative evidence and limited funds for evaluation. Despite the additional costs associated with using more robust data collection methods, we found no evidence that these measures were cost effective at increasing the accuracy of MEEs. If done strategically, validating MEEs could not only be used to indicate the accuracy of assessors’ personal judgments, but also to test the value of some of the commonly used methodological processes and so inform the design of future MEEs. We strongly encourage anyone developing or implementing MEEs to make validating evaluation data part of evaluation process and to publish both the evaluation data and the methodological lessons learnt. While management agencies may be reluctant to publicize evaluation data, sharing this information provides an opportunity for everyone to share the lessons learned and to improve the rigor of future evaluations.


We thank all the key informants who participated in this study. CNC was funded by the Australian Research Council, NSW Department of Environment, Climate Change and Water, Parks Victoria, the Department of Sustainability, Environment, Water, Population and Communities, and a University of Queensland travel grant. We also thank four anonymous reviewers for comments that improved this manuscript.