## Introduction

Markov decision processes (MDPs) have come to play an increasingly important role in conservation planning research, forming the basic model for recent investigations into metapopulation management (e.g. Westphal *et al*. 2003), invasive species control (e.g. Bogich & Shea 2008), translocation (e.g. Tenhumberg *et al*. 2004) and sequential reserve selection (e.g. Costello & Polasky 2004). The goal in such applications is to construct a *policy* that associates each state of the system with a particular action. This policy should offer optimal performance in the sense of maximizing or minimizing a specified conservation objective (Possingham *et al*. 2001).

A standard technique for solving finite-horizon MDPs is backward induction. This dynamic programming algorithm relies on an extensional representation of the state space and explicit enumeration (i.e. every state is visited at every time step) to derive the optimal policy. Specifying the effects of actions in terms of state transitions can be problematic, however, because the size of the state space grows exponentially with the number of state variables. For example, a problem with 15 binary (0 or 1) state variables (2^{15} = 32 768 states) would require transition matrices with 1 073 741 824 entries to represent the effects of each action. This ‘curse of dimensionality’ (Bellman 1961) has an impact on the feasibility of the specification and solution of MDPs in the context of real-world conservation planning.

A great deal of emphasis in the artificial intelligence community has been placed on dodging the curse of dimensionality by means of approximation (for discussion, see Boutilier, Dean & Hanks 1999; Li, Walsh & Littman 2006; Chapter 6 of Bertsekas 2007; Powell 2007). One class of methods involves restricting search to locally accessible regions of the state space (for examples in conservation, see Nicol *et al*. 2010; Nicol & Chades 2011). These methods generally take advantage of the fact that the current state of the system may be known, forming the basis of what we view as an abbreviated version of decision tree search. Each action at the current state forms the first level of the tree. A generative model is then used to simulate the possible future states given each action, which are placed at the second level of the tree. The third level has the actions applicable at the states at the second level and so on, looking ahead a defined number of steps. A sub-MDP calculation is then carried out on the simulated future states, which approximates the optimal action for the root of the search tree (i.e. the current state). The important point to observe is that attention is restricted to the locally accessible regions of the state space. This can have advantages over conventional dynamic programming techniques, especially if only a fraction of the state space is connected to the current state in a given number of look-ahead steps (Boutilier, Dean & Hanks 1999).

The primary disadvantage of these local search methods is that many states are ignored in policy construction. The work of Nicol *et al*. (2010) and Nicol & Chades (2011) can be viewed as applications of ‘online’ methods that handle the problem in a serial fashion. They sacrifice the optimal policy for a fast, local approximation that applies only to the current state. Here, we explore a way that also sacrifices the optimal policy but does so for an approximation that applies to every state in the state space. We describe an abstraction method proposed by Dearden & Boutilier (1997). The method is a form of state aggregation; more specifically, we use the reward structure of the problem to select a subset of the state variables whose impact on the value of a state is minimal, negligible or absent. These state variables are then deleted from the problem description. The abstract state space (which is exponentially smaller than the original) is found by aggregating all states that agree on the values of the state variables that remain. The idea is to capture the most important aspects of the original MDP, find an optimal policy over this reduced space and use this as an approximate solution to the original problem. We first review MDPs and the backward induction algorithm, then the method, problems with it and interesting conclusions are illustrated with two problems in conservation planning.