## Introduction

Numerous problems in ecology involve making decisions about the best option among a set of competing strategies. These so-called optimization problems can be solved using mathematical procedures such as linear programming (Nash & Sofer 1996) which allows the determination of maximum benefits or minimum costs given some objectives and under some constraints for deterministic systems assumed at equilibrium. If uncertainty in the dynamic of the system needs to be accounted for, a Markov decision process (MDP, Puterman 1994; Williams 2009) model is usually adopted. ‘MDPs are models for sequential decision making when outcomes are uncertain’ (Puterman 1994). MDPs are made of two components: Markov chains that model the uncertain future states of the system given an initial state and a decision model. First, a MDP is a Markov chain in which the system undergoes successive transitions from one state to another through time. For example, these state transitions can correspond to the change of a population size from 1 year to the next. In Markov chains, the transitions to future states only depend on the current state of the system. In other words, the state of the system at time step *t* provides sufficient information to predict the states of the system at time step *t*+1. Second, a MDP involves a decision-making process in which an action is being implemented at each sequential state transition. In the conservation and wildlife management literature, the phrase stochastic dynamic programming (SDP) is often used to refer to both the mathematical model (MDP) and its solution techniques (SDP *per se*)**.** MDPs are usually modelled and solved by going through several successive steps: defining the different objectives and formalizing them as a mathematical function of costs and/or benefits (Williams, Nichols & Conroy 2002); defining possible states of the system, monitoring the system and making statistical inference on system behaviour (Nichols & Williams 2006); defining a set of alternative actions that influence the performance of the system; building a dynamic model to describe the system transitions from one state to another after implementing every possible decision; and finally determining the optimal strategy that is the set of decisions that is expected to best fulfil the objectives over time (Runge 2011). These objectives are formalized in a utility function that prioritizes some desired outcomes by evaluating the benefits of any decision for the system (Williams, Nichols & Conroy 2002). MDP models highlight the trade-off between obtaining current utility and altering the opportunities to obtain utility in the future. Such problems abound in ecology because decisions taken today often have important implications for the future behaviour of biological systems.

Stochastic dynamic programming is an optimization technique used to solve MDPs and is appropriate for the nonlinear and random processes involved in many biological systems. While the time dimension is often neglected in optimization procedures such as classical linear or nonlinear programming, SDP determines state-dependent optimal decisions that vary over time (Williams, Nichols & Conroy 2002). Finally, SDP is acknowledged to be one of the best tools for making recurrent decisions when coping with uncertainty inherent to biological systems (Possingham 1997, 2001; Wilson *et al*. 2006; Chadès *et al*. 2011).

The principle of SDP relies on the partitioning of a complex problem in simpler subproblems across several steps that, once solved, are combined to give an overall solution (Mangel & Clark 1988; Lubow 1995; Clark & Mangel 2000). SDP was first developed and used in applied mathematics, economics and engineering (Bellman 1957; Intriligator 1971) and has gained attention in ecology (Mangel & Clark 1988; Shea & Possingham 2000). A pioneer use of SDP was in behavioural ecology to determine individuals' breeding and foraging strategies maximizing fitness (Houston *et al*. 1988; Mangel & Clark 1988; Ludwig & Rowe 1990). Early work in resource management included applications to pest control (Winkler 1975) and fisheries management (Walters 1975; Reed 1979). In conservation biology, SDP has been successfully used to produce evidence-based management recommendations (optimization of resources allocation: Westphal *et al*. 2003; Martin *et al*. 2007; Chadès *et al*. 2011; management of natural resources in the context of global change: Martin *et al*. 2011). In forestry, SDP allowed achieving a balance between the protection of biological diversity and sustainable timber production (Lembersky & Johnson 1975; Teeter, Somers & Sullivan 1993; Richards, Possingham & Tizard 1999). Stochastic dynamic programming has also been implemented in various studies aiming at controlling the spread of weeds, pests or diseases (Shea, Thrall & Burdon 2000; Baxter & Possingham 2011; Pichancourt *et al*. 2012), to determine the best water management policies (Martin *et al*. 2009) or to enhance the efficiency of a biocontrol agent (Shea & Possingham 2000). In wildlife management, SDP has often been used to find the optimal rates for harvesting populations (Johnson *et al*. 1997; Milner-Gulland 1997; Spencer 1997; Martin *et al*. 2010).

Despite the flexible nature of SDP and its ability to solve important decision-making problems in ecology, its transfer to ecologists is difficult. One reason for the slow uptake is the mathematical knowledge required for SDP to be implemented. Here, we provide a primer on SDP for ecologists. We introduce the main concepts of SDP, provide a step-by-step procedure to implement dynamic programming in a deterministic system and illustrate how to make decisions in the presence of uncertainty. We demonstrate the applicability of SDP by applying this approach to data from a wolf population controlled by culling. We provide R code to run the models as well as procedures in specialized toolboxes implementing SDP that can conveniently be amended for one's own purposes.