Flexible and practical modeling of animal telemetry data: hidden Markov models and extensions


  • Roland Langrock,

    1. Centre for Research into Ecological and Environmental Modelling, School of Mathematics and Statistics, The Observatory, Buchanan Gardens, University of St Andrews, St Andrews, Fife KY16 PLZ Scotland, United Kingdom
    Search for more papers by this author
  • Ruth King,

    1. Centre for Research into Ecological and Environmental Modelling, School of Mathematics and Statistics, The Observatory, Buchanan Gardens, University of St Andrews, St Andrews, Fife KY16 PLZ Scotland, United Kingdom
    Search for more papers by this author
  • Jason Matthiopoulos,

    1. Scottish Oceans Institute, School of Biology, University of St Andrews, St Andrews, Fife KY16 8LB Scotland, United Kingdom
    Search for more papers by this author
  • Len Thomas,

    1. Centre for Research into Ecological and Environmental Modelling, School of Mathematics and Statistics, The Observatory, Buchanan Gardens, University of St Andrews, St Andrews, Fife KY16 PLZ Scotland, United Kingdom
    Search for more papers by this author
  • Daniel Fortin,

    1. Centre d'Étude de la Forêt and Département de Biologie, Université Laval, Québec City, Quebec G1V 0A6 Canada
    Search for more papers by this author
  • Juan M. Morales

    1. Ecotono, INIBIOMA-CONICET, Universidad Nacional del Comahue, Quintral 1250, 8400 Bariloche, Argentina
    Search for more papers by this author

  • Corresponding Editor: B. D. Inouye.


We discuss hidden Markov-type models for fitting a variety of multistate random walks to wildlife movement data. Discrete-time hidden Markov models (HMMs) achieve considerable computational gains by focusing on observations that are regularly spaced in time, and for which the measurement error is negligible. These conditions are often met, in particular for data related to terrestrial animals, so that a likelihood-based HMM approach is feasible. We describe a number of extensions of HMMs for animal movement modeling, including more flexible state transition models and individual random effects (fitted in a non-Bayesian framework). In particular we consider so-called hidden semi-Markov models, which may substantially improve the goodness of fit and provide important insights into the behavioral state switching dynamics. To showcase the expediency of these methods, we consider an application of a hierarchical hidden semi-Markov model to multiple bison movement paths.


Analyzing animal movement is essential for understanding processes such as inter- and intraspecific interactions, the dynamics of populations, and their distribution in space. Recent research has focused on dissecting movement patterns into different behavioral states, each with an associated set of parameters governing the movement process. This was accompanied by a focus on Bayesian methods, which allow inference for models with analytically intractable likelihoods (Morales et al. 2004, Jonsen et al. 2005, McClintock et al. 2012). Many movement models proposed in recent years in fact belong to the class of hidden Markov models (HMMs), with only a handful of implementations of likelihood-based (i.e., classical, non-Bayesian) HMM methodology for movement data (Franke et al. 2006, Holzmann et al. 2006, Patterson et al. 2009, Pedersen et al. 2011b). The principal aim of this manuscript is to outline how basic HMM-type movement models can be extended in different ecologically interesting and novel ways, thereby demonstrating that the full potential of HMMs for modeling animal movement has yet to be realized. HMMs cannot deal with all problems typically encountered with telemetry data; see the sections HMM basics and Discussion. However, important advantages of using HMMs are computational tractability and mathematical simplicity. Thus, for many applications they constitute a competitive alternative to more flexible, yet more challenging, state-space models (SSMs) and associated Bayesian methods (Patterson et al. 2008).

Our presentation is as follows. The section HMM Basics introduces the relevant HMM theory. The section HMMs for a single animal's movement discusses how a single animal's movement path can be described using HMMs. Basic HMMs can be extended in various ways, and in particular by allowing for more complicated dependence structures and by including random effects. We outline such extensions and the associated statistical methodology in the section Extensions. In the section Application: bison in Saskatchewan, Canada, the implementation of two important extensions is demonstrated by applying a hierarchical hidden semi-Markov model to multiple bison movement paths.

HMM Basics

Model formulation

An HMM is a time series model that comprises two components, an observable series and an underlying, non-observable state sequence. The observed variables, which we denote by Z1, … , ZT, in the context of this paper relate to some quantification of movement. When modeling movement data, one can either model the positions themselves or some derived quantities. In the models that we consider, Zt is bivariate and comprises step lengths and turning angles (or directions). Each (bivariate) observation is assumed to be generated by one of N distributions, and the unobservable (hidden) state sequence, S1, … , ST, modeled as a Markov chain, determines which of the N distributions is selected at each time t. In the context of animal movement, the states can be interpreted as behavioral states of the observed animal, such as “foraging” or “encamped”; Morales et al. (2004) provide a more detailed elaboration of this concept.

HMMs have precisely the same dependence structure as SSMs (see Appendix A for an illustration). Some authors in fact regard HMMs and SSMs as the same type of objects (Cappé et al. 2005), although an HMM is usually regarded as a special case of an SSM where the number of hidden states is finite. In SSM approaches to animal movement modeling, it is commonly assumed that the movement metrics are part of the hidden component, i.e., not directly observed due to measurement error (Patterson et al. 2008). Here, we assume that measurement error is negligible and, thus, that we observe the movement metrics directly. This reduces flexibility in terms of incorporating measurement error, with the advantage that the resulting models are much more mathematically tractable than SSMs, because the hidden component takes only a finite number of values.

The Markov chain St is usually assumed to be of first order, which means that the probability of state occurrences at time t + 1 depends only on which state the chain is in at time t. The state transition probabilities (Pr) at time t are summarized in the N × N matrix Γ(t)= inline image, where

display math

Estimation, decoding, model selection, and model checking

In the homogeneous case, i.e., if the transition probabilities do not depend on the time index t, the likelihood (��) of an HMM is given by the following matrix product:

display math

where P(z) = diag(f1(z), … , fN(z)) with fn = inline image denoting the conditional probability density function of Zt, given St = n; 1 is a row vector of ones; z1, … , zT denote the observations; δ(1) is the initial distribution of the Markov chain; and diag refers to a diagonal matrix. The initial distribution is usually assumed to be the stationary distribution (Patterson et al. 2009). It is straightforward to deal with missing data: for a corresponding time index t, the matrix P(zt) is replaced by the N × N unity matrix. Maximum likelihood estimates of the parameters can be found by numerical likelihood maximization, subject to well-known technical issues arising in all optimization problems (Zucchini and MacDonald 2009). Alternatively, one may apply the expectation-maximization (EM) algorithm.

Via the Viterbi algorithm, one can obtain estimates of the underlying hidden behavioral states, s1, … , sT, given an observed sequence of movements (Zucchini and MacDonald 2009: Chapter 5).

Furthermore, because there is an explicit, closed-form likelihood, it is straightforward to employ information criteria for model selection. Such measures compare the relative goodness of fit, while for example an analysis of the residuals provides a measure of absolute goodness of fit. Pseudo-residuals offer a convenient way for model checking in HMMs (Patterson et al. 2009; see also Appendix B).

HMMs for a Single Animal's Movement

We consider multistate random walks in the spirit of Morales et al. (2004). These can be regarded as HMMs where each state is associated with a distinct random walk behavior. The class of building blocks that we consider for the state-dependent process Zt comprises correlated and biased random walks (CRWs and BRWs, respectively), as well as walks that are both correlated and biased (BCRWs); see Codling et al. (2008) and Appendix C for more details. A typical HMM for animal movement will involve some combination of CRWs, BRWs, and BCRWs, each of them being allocated to a different state of the underlying Markov chain. Four sample movement trajectories simulated from HMMs with different components are given in Appendix D, illustrating the flexibility of these models. More ecological realism may be obtained, e.g., by allowing for seasonality or modeling parameters as a function of environmental covariates (see, e.g., Morales et al. 2004). Including covariates is straightforward in HMMs (Patterson et al. 2009).

A more specific question concerns the choice of the state-dependent distributions. Step lengths are, by nature, positive and continuous, which renders gamma (with the exponential as a special case) and Weibull distributions plausible candidates for modeling this component. Plausible distributions for modeling the circular-valued statistics given by turning angles (or directions) include the von Mises and wrapped Cauchy. All of these distributions have been used previously in animal movement models (Morales et al. 2004, Holzmann et al. 2006) and are equally easy to fit in the HMM framework.


Other formulations of the state process

Semi-Markov state processes

One important limitation of basic HMMs is that the state dwell-time distributions (i.e., the times spent in a state before changing to a different state) are necessarily geometric. In some applications this may not be biologically reasonable, in which case models that provide more flexibility are called for. So-called hidden semi-Markov models (HSMMs) are designed to relax this condition: the dwell time in states of an HSMM is explicitly modeled from some distribution on the positive integers (Guédon 2003). The state process of an HSMM is determined by the probability mass functions (pmfs) of the dwell-time distributions, p1(r), … , pN(r) (with pn(r) the probability that the duration of a stay in state n is r time units), and the conditional state transition probabilities, given the current state is left: γij = Pr(St+1 = j | St = i, St+1i),   i,  j = 1, … , N. The difference to HMMs is that the probabilities of self-transitions, γii, which determine the times spent in the states, are now modeled separately by the distributions pi.

Langrock and Zucchini (2011) demonstrate how any given HSMM can be approximated with arbitrary accuracy by a suitably structured HMM, which means that HSMMs can be fitted in the well-developed HMM framework. We illustrate the basic idea via a simple example. Suppose that a two-state HSMM is to be fitted, with state-dependent distribution fn for state n = 1, 2, pmf p1(r) from some non-geometric distribution with infinite support (e.g., a Poisson distribution), p2(r) a geometric distribution, and a uniform initial state distribution. We now construct an HMM that approximates this HSMM by expanding the state that has a non-geometric dwell time into a set of N* states (where N* is some large number) and suitably structuring the transition probabilities between those N* states. Consider an HMM with N* + 1 states, initial distribution δ(1) = (0.5, 0, … , 0, 0.5), and state-dependent distributions given by inline image = f1 for k = 1, … , N* and inline image = f2. Each state in the set inline image is associated with the same distribution of the state-dependent process (I is called a state aggregate). We further define the (N* + 1) × (N* + 1) transition probability matrix of the HMM as

display math


display math

with the usual convention that the empty sum equals 0. It can easily be shown (see Appendix G) that the pmf inline image(r) of the dwell time in I matches p1(r) for r = 1, … , N*. For r > N*, inline image(r) decays geometrically, which, in general, is only an approximation of p1(r). The approximation, however, can be made arbitrarily accurate by increasing N*. We have thus represented the dwell time in state 1 of the HSMM as the dwell time in the state aggregate I of the approximating HMM (and recall that at the observation level there is no difference between state 1 of the HSMM and states 1, … , N* of the HMM). This formulation does not entail an increase in the number of parameters, compared to the HSMM that is to be represented. In order to get an accurate approximation, this representation typically will involve large matrices, which makes these models computationally more demanding to fit than basic HMMs. The same is true for alternative strategies for estimating HSMMs (Guédon 2003). Crucially, the structure of the transition probability matrix is the same irrespective of what dwell-time distribution is to be fitted. For a more comprehensive description of this method, we refer the reader to Langrock and Zucchini (2011).

Higher-order state processes

On only a slightly different note, another crucial assumption in the basic HMM is that the state process has a memory of one time lag only. Theoretically, it is easy to increase the memory of the Markov chain, and there are simple means for implementing the associated models. For example, if the Markov chain, St, is of second order, then the Markov chain Ut = (St–1, St) is of first order. Although higher-order state processes can also imply non-geometric state dwell-time distributions, there is a conceptual difference to the semi-Markov approach: here one increases the flexibility in modeling the animal's memory, including the memory of particular state switches, whereas in the semi-Markov approach, the time of staying in a state is modeled explicitly, but is unaffected by which behavioral state the animal occupied previously.

Feedback models

Another strategy for modifying the dependence structure is to incorporate feedback from the observed process on the subsequent generation of states. Zucchini et al. (2008) modeled feeding behavior using an HMM-type feedback model. In their application, the observed behavior is either feeding or non-feeding, and the particular behavior observed influences the probabilities of transitioning to subsequent states (i.e., there is additional dependence of St+1 on Zt as determined by a functional relation between the observation at time t and the probabilities of state transitions in the time interval [t, t + 1]).

Continuous-valued behavioral states

Another critical issue concerns the state space (i.e., the set of possible behavioral states), because in some applications the division into a finite number of states may seem counterintuitive. The likelihood of a model with a continuous state space (i.e., that of an SSM) is given by a high-order multiple integral that usually cannot be evaluated analytically. However, by finely discretizing the state space, one is back in the HMM framework, such that estimation is not considerably more challenging than for models with a finite number of states (Langrock 2011, Pedersen et al. 2011a).

Multiple animals

Scaling individual movement models up to the population level is an issue of great ecological relevance, but it first requires us to quantify individual variation. The class of HMMs provides several different strategies for dealing with a set of time series of individual animal movement paths. The simplest approach is to assume that the model parameters are identical across individuals (complete pooling). This approach can be unrealistic because it does not permit any individual variability. Conversely, it could be assumed that each parameter is individual-specific, i.e., that each individual has its own independent set of parameters (no pooling). This approach involves a larger number of parameters and generally post analysis ad hoc comparisons among individuals. Patterson et al. (2009) compare these approaches in an HMM application to tagging data from southern bluefin tuna (Thunnus maccoyii). An important alternative to these extreme options is a hierarchical model (partial pooling), where one assumes a common distribution for (some) individual-level parameters; these are then called random effects. For implementations of such movement models in a Bayesian framework, see e.g., Jonsen et al. (2006) and Eckert et al. (2008). For a likelihood-based implementation of a hierarchical HMM in a different ecological context, see Schliehe-Diecks et al. (2012). This approach substantially reduces the number of parameters to be estimated, and allows joint inference about population-level parameters. The implementation of random effects can be computationally intensive because the likelihood needs to be integrated over the random effects' distribution (Altman 2007):

display math

Here θinline image denotes the vector of random effects (from the space of all d-dimensional vectors with real-valued components); M is the number of individuals (animals); zt,m denotes the observation made at time t for individual m; and f is the joint density of the random effects, with its choice depending on which parameters are to be modeled as random effects. It is further assumed that individuals act independently of each other. Numerical likelihood maximization, which we conducted in the application to bison in the next section, is only feasible for models involving few random effects; the other parameters are assumed to be common to all individuals. Monte Carlo EM (expectation-maximization) has been suggested when many random effects are included (Altman 2007). A computationally less demanding way of accounting for heterogeneity across individuals is to incorporate individual-specific covariates in the model. This strategy requires that suitable covariate data are available.

Application: Bison in Saskatchewan, Canada


We consider locations of nine American bison (Bison bison), recorded between October 2005 and April 2006 with GPS radio collars in Prince Albert National Park, Saskatchewan, Canada. The observations are spaced at regular time intervals every three hours. More data are available (Babin et al. 2011), but for simplicity we restricted the present analysis to the winter of 2005/2006. The bison behavior varies with the season, but within one winter, homogeneity over time can reasonably be assumed. Between 1427 and 1660 locations were observed for each of the nine bison (there were some missing data). Plots of the nine movement paths are given in Appendix E. From the locations, we calculated turning angles (in radians) between subsequent movement directions, and the associated Euclidean step lengths (in km).


We describe the results of fitting two different types of models at the individual level (model 1 HMMs; model 2 HSMMs) and of fitting three different types of models simultaneously to all nine bison paths: HMMs (model 3) and HSMMs (model 4) with complete pooling, and finally a hierarchical HSMM (model 5). All models were estimated via numerical maximum likelihood; R code is given in the Supplement. In all cases, we assumed two states, stationarity of the (semi-)Markov chain and that each state is associated with a distinct CRW. We assumed Weibull distributions for the step lengths, and wrapped Cauchy distributions for the turning angles. (Gamma distributions and von Mises distributions, respectively, were outperformed in terms of the AIC; see Appendix F.) In the HSMMs, models 2 and 4, we assumed negative binomial distributions for the state dwell times, with probability mass function in state n (n = 1, 2) given by

display math

with parameters kn > 0 and πn ∈ [ 0, 1]. Note that the HMMs are nested (k1 = k2 = 1). The HSMMs were fitted using the HMM approximation method, as described in Extensions: Other formulations of the state process. Each of the two state aggregates that we used, corresponding to p1 and p2, respectively, consisted of 30 states, thus ensuring a very accurate approximation (with this size of the state aggregates, the HMM approximation is essentially exact).

Model 5, a hierarchical two-state HSMM, assumes the Weibull scale parameter λn,i (n = 1, 2 denoting the behavioral state) for bison i to be drawn from a log-normal distribution, i.e., log(λn,i) ∼ N(μn, σn). All other parameters were assumed to be common across individuals in this model. Biologically this parameterization means that the expected state-dependent step lengths vary across animals (because the mean of the Weibull distribution is proportional to the scale parameter).


Details of the model-fitting results are given in Appendix F. According to the fitted individual-specific HMMs and HSMMs (models 1 and 2), the movement paths of the different bison show a similar (stochastic) pattern. In particular, each bison switches between an “exploratory” (or “commuting”) state, with many long steps and few turnings, and an “encamped” state, with short steps and more frequent reversals (Fig. 1). The results are similar to those obtained for elk by Morales et al. (2004). Similarity in the patterns of the nine movement paths is stressed by the fact that when all nine bison are modeled simultaneously (models 3 and 4), the resulting AICs are lower than those of their respective counterparts with individual-specific parameters (i.e., the AICs resulting from the joint likelihood of the individual-specific models 1 and 2, respectively).

Figure 1.

Models 2 and 4 for the movement paths of nine American bison: fitted state-dependent Weibull distributions for step lengths (left) and wrapped Cauchy distributions for turning angles (right). Solid lines show “exploratory” movement; dashed lines show “encamped” movement. Gray lines are results for the individual-specific HSMMs, and black lines are results for HSMM with parameters common to all individuals.

The HSMM with complete pooling, model 4, has lower AIC than the HMM with complete pooling, model 3. At the individual level, HSMMs are, in terms of the AIC, superior to their HMM competitors in seven out of nine cases. In the “exploratory” state, for seven out of nine bison the mode of the HSMM-derived dwell-time distribution is greater than one. In the “encamped” state, the most apparent difference between the fitted dwell-time distributions of the HMMs and those of the HSMMs is that in the latter case there is more mass on one in seven out of nine cases. A typical movement pattern thus consists of several successive commuting-type steps that are disrupted by pauses that mostly last only one time unit (here: three hours). Residual analyses conducted for the fitted individual-based HSMMs show that these models provide adequate fits (see Appendix B). However, the fitted models predict more steps of extremely short length (<10 m) than were observed (possibly related to GPS measurement error).

In terms of the AIC, the hierarchical HSMM, model 5, performs better than any other model that we considered (ΔAIC = 5.0, compared to the HSMM with complete pooling). The parameter estimates and 95% confidence intervals for the hierarchical HSMM are given in Table 1. Note that neither of the two confidence intervals associated with the size parameters of the state dwell-time distributions contains the value 1 (which corresponds to the special case of a geometric distribution, i.e., the HMM case). The relatively high uncertainty in the estimates associated with the random effects distributions is most likely due to the small number of individuals. According to the fitted random effects distributions, 95% of the individual-specific mean step lengths are in the intervals [0.073, 0.112] in the “encamped” state and [0.464, 0.564] in the “exploratory” state.

Table 1. Estimated parameters with 95% confidence limits for model 5 (hierarchical hidden semi-Markov model) where state i = 1 is the “encamped” state and state i = 2 is the “exploratory” state.Thumbnail image of


We outlined various extensions of basic HMM methods that may prove useful in animal movement modeling. Many of these extensions have been developed only recently in the statistical literature, implying that the full scope of HMMs for modeling animal movement data has not yet been recognized in the ecological literature.

In particular, we have demonstrated how considerable flexibility can be gained with modest effort by allowing for semi-Markov state processes. In an HSMM, one explicitly models the time an animal stays in a behavioral state, rather than simply accepting the geometric decay of the dwell-time distribution imposed by basic HMMs. We believe this extension to be potentially very important for ecological time series, not only because it may be more realistic and thus improve the fit, but also because it may lead to important insights into the dynamics of behavioral changes that basic Markov models cannot provide.

Another important issue is hierarchical modeling of animal movement paths. Hierarchical models, in principle, allow us to quantify the difference between movement patterns of different animals. In the ecological literature, hierarchical movement models have been fitted several times using Bayesian methods (e.g., Jonsen et al. 2006, Eckert et al. 2008). We have demonstrated that such models can also be fitted in a likelihood-based HMM framework, provided that the number of random effects is small. We expect that in many applications the simultaneous incorporation of both individual-specific covariates and (few) random effects in the model will adequately account for possible heterogeneity, while being feasible and directly interpretable.

Two limitations of the HMM approach to animal movement modeling need to be mentioned. First, the approach discussed here is suited only to regularly timed observations. If time intervals between successive observations vary, it makes no sense to assume state transition probabilities and state-dependent distributions to be homogeneous. Missing observations do not represent a problem, but for data that involve observations that are completely irregularly spaced in time, continuous-time models such as the Ornstein-Uhlenbeck process (Blackwell 2003) or a continuous-time CRW (Johnson et al. 2008) seem better suited. Such models allow for a better independence between the scales of behavior and those of observations, but are considerably more challenging.

Second, we suggest that the type of models presented here should only be used for time series with negligible observation error. In principle, one may assume that the observations are subject to some random error, but this quickly leads to technical problems (e.g., state-dependent distributions become convolutions that, in most cases, cannot be evaluated analytically). In order to account for measurement error, Pedersen et al. (2011b) present an alternative continuous-time HMM approach, based on a multistate advection−diffusion model that involves a grid discretization of space.

Finally, as with other movement modeling techniques, care should be taken in considering the temporal scale of analysis. The analyzed bison data have a resolution of 3 hours and our HMM formulations imply that animals are allowed to make a “behavioral switch” at this temporal scale, which is unrealistic. Thus, the classification of “encamped” vs. “exploring” should not be taken literally. Apart from this caveat, the models that we implemented were capable of capturing important features of the data.

Despite these limitations, the scope of HMMs in animal movement modeling is very wide, particularly as the relevance of the measurement error diminishes with technological advancements, and because in many studies locations are indeed observed at regular time intervals. We believe that their computational tractability (fitting an HMM to the movement path of one bison takes about 1 minute, or 10 minutes for an HSMM; and fitting the hierarchical HSMM took about 2 days) and mathematical simplicity will make this class of models attractive to ecologists.


The authors thank Tiago Marques, Martin Wæver Pedersen, and Ian Jonsen for very helpful comments on earlier versions of the manuscript. R. Langrock was funded by the Engineering and Physical Sciences Research Council (ESPRC reference EP/F069766/1).

Supplemental Material

Appendix A

Graphical model of an HMM for animal movement (Ecological Archives E093-220-A1).

Appendix B

Description of HMM model checking via pseudo-residuals, and plots of pseudo-residuals obtained in the bison application (Ecological Archives E093-220-A2).

Appendix C

Outline of the different types of random walks, and description of how biased random walks can be fitted in the HMM framework (Ecological Archives E093-220-A3).

Appendix D

Plots of movement trajectories simulated from four different types of HMMs (Ecological Archives E093-220-A4).

Appendix E

Plots of the bison movement paths (Ecological Archives E093-220-A5).

Appendix F

Model fitting results, including parameter estimates for the individual-specific models, log-likelihood and AIC values for all considered models, and plots of the fitted state dwell-time distributions in the individual-specific models (Ecological Archives E093-220-A6).

Appendix G

Verification of the HMM representation being an approximate representation of the example HSMM given in the section Extensions: Other formulations of the state process (Ecological Archives E093-220-A7).


R code for fitting the individual-specific HMMs and HSMMs, for computing the HSMM residuals, for fitting the hierarchical HSMM, and observations for one of the bison (Ecological Archives E093-220-S1).