### Introduction

- Top of page
- Summary
- Introduction
- Statistical framework and movement models
- The likelihood function
- Analysis of gazelle movement
- Analysis of gazelle area utilization
- Discussion
- Acknowledgments
- Data accessibility
- References
- Supporting Information

Movement ecology is currently undergoing a period of rapid expansion, fuelled by the increasing availability of high-quality animal relocation data sets. While the methods used to analyse movement data are also evolving, most are still based on discrete-time correlated random walks (CRWs) and their continuous-time diffusion approximations. For example, recently developed composite random walk methods represent elaborations of the basic CRW framework (Kareiva & Shigesada 1983; Turchin 1998; Codling, Plank & Benhamou 2008) While these models enjoy widespread use in ecology, they have a range of known limitations, (Bovet & Benhamou 1988; Turchin 1998; Codling & Hill 2005; Nouvellet, Bacon & Waxman 2009; Gautestad 2012; Fleming *et al*. 2014) whose severity is not fully acknowledged or appreciated, despite repeated demonstrations. Key among these drawbacks is the unrealistic assumption that estimated step lengths and turn angles actually correspond to start and end points of discrete movement behaviours, whereas location fixes usually follow a programmed schedule that may have little relationship with important behavioural events (Bovet & Benhamou 1988). When discretizing a continuous movement path, step-length and turn-angle distributions are heavily influenced by the choice of sampling rate (Nouvellet, Bacon & Waxman 2009; Fleming *et al*. 2014), which is often based on logistical considerations such as managing GPS-collar battery life rather than *a priori* considerations of the important time-scales of the movement process. As a result, these quantities often reveal little about the movement process. The step-length and turn-angle distributions only play a fundamental role for truly discrete movement processes, where the sampling locations and times correspond to actual behavioural events (i.e. steps and turns).

The more realistic continuous-time stochastic processes (CTSP) models have seen relatively limited use in movement ecology (but see Dunn & Gipson 1977; Brillinger & Stewart 1998; and Blackwell 2003; Horne *et al*. 2007; Johnson *et al*. 2008; Gurarie, Andrews & Laidre 2009), and they have always suffered from the key drawbacks of assuming a particular CTSP model and then fitting it to the data in a fashion analogous to discrete CRW analysis. Prior to Fleming *et al*. (2014), we have not seen any examples of exploratory signal-processing analyses being used to motivate model selection, whereby movement behaviours are first identified in a general way, without assuming any particular model, and then, models incorporating these behaviours are tested. Instead, very particular CTSPs, such as the Orstein–Uhlenbeck process, have always been assumed *a priori*, rather than selected from a set of plausible candidate models. Finally, the likelihood functions used in all studies that fit CTSPs to movement of which we are aware condition only on movements that occur at the sampling rate and effectively difference the data to yield a Markovian representation of the movement process that is only valid if the model is true. This results in a Markovian likelihood function that, while straightforward to implement, is heavily influenced by the choice of sampling rate. As we will show, this does not allow CTSPs to be used to their full potential.

Animal movement data often feature rich autocorrelation structure, including a broad range of spatial and temporal scales and, in some cases, persistence over very long periods of time (Polansky *et al*. 2010; Fleming *et al*. 2014). Although autocorrelation has long been considered a nuisance factor in spatiotemporal ecological data that must be ‘dealt with’ or ‘accounted for’ (Swihart & Slade 1985; Worton 1989), it can, when properly harnessed, reveal a wealth of information about the underlying movement behaviour (Boyce *et al*. 2010; Polansky *et al*. 2010; Fleming *et al*. 2014) and provide new paradigms for analysis (Legendre 1993). While there is growing awareness of the importance of autocorrelation in understanding animal movement, most current analytical methods can only deal with autocorrelation in a very limited way. For example, the CRW framework de-emphasizes the importance of autocorrelation by its very design. These models are constructed as Markov chains, where the current movement step depends only on the previous step. This limits the autocorrelation structure to simple autoregressive models, whether or not this type of model is the most relevant. And while CTSPs are inherently capable of incorporating a much broader array of autocorrelations, currently used Markovian likelihood functions effectively limit them to the same kind of short-range, fast-decaying autocorrelations as CRWs. For more general autocorrelated processes, movement at the current point in time may depend upon a long continuation of past movements. The strength of that dependence decays with time-lag according to characteristic movement time-scales that are generally unrelated to the sampling schedule. Therefore, the inclusion of dependence only on a single time step in the past is an artificial, sampling-dependent construction. While the perspective we advocate is much more difficult to treat mathematically (see, for instance, Adelman 1976; Calzetta, Roura & Verdaguer 2003), only non-Markovian statistical methods that use all possible time-lags in the data can account for and take advantage of the full sweep of autocorrelated behaviour in movement data.

Here, we introduce a non-Markovian likelihood function that can unlock the full power of CTSPs for analysing and understanding animal movement. This likelihood function can fit autocorrelated movement models that can be constructed without any assumption of stationarity or hidden Markov property, whereas previously considered likelihood functions assume that autocorrelated telemetry data can be differenced to obtain ‘hidden’ Markov processes. The sampling schedule is given no special importance in our treatment, and instead, all unobserved times are properly marginalized out of the full-time distribution. In other words, we assume an underlying continuous-time path and correctly treat discrete relocation data as an incomplete sample of this path. Autocorrelation in the data is not avoided by differencing or thinning, but, instead, it is modelled and fit to using all lags in the data. In this way, all noted drawbacks of the previous analyses can be avoided, and arbitrary autocorrelation structures can be tested and compared via likelihood ratio test or AIC. As a case study, we consider a hierarchy of increasingly complex models, from Brownian motion to a multiscale movement model derived in (Fleming *et al*. 2014), with and without assumptions of isotropy, at the population and individual level. The range of included behaviours is beyond the scope of previous analyses and is only possible due to the efficiency and generality of the non-Markovian likelihood function. This family of models is applied to sample of 36 Mongolian gazelles – a species previously described as nomadic (Olson *et al*. 2010; Mueller *et al*. 2011) and later identified as exhibiting movement behaviours at multiple scales (Fleming *et al*. 2014).

#### The time-series analysis framework

In the CRW framework, there is a straightforward procedure for inspecting the step-length and turn-angle distributions, proposing candidate models and then selecting between them. There is an analogous procedure for continuous-time models, although with a slight difference. In general, one cannot estimate the distribution of a continuous-time process, but one can estimate the moments and cumulants of the process, such as the mean (first cumulant) and autocorrelation function (second cumulant). The first two cumulants alone are sufficient to describe drift and diffusion and completely define any Gaussian stochastic process that has a finite range. Furthermore, the cumulants are not limited to any assumption of stationarity or Markov property. Therefore, the cumulants, which may be considered the fundamental statistics of the stochastic process (Fleming *et al*. 2014), are given primacy in our analysis, rather than the distribution.

Our framework begins by estimating of the movement processes’ autocorrelation structure nonparametrically. The variogram (Cressie 1993; Fleming *et al*. 2014) is an excellent estimator of the mean squared displacement (MSD) for stationary processes. It uses the data much more thoroughly than the conventional MSD statistic (Fleming *et al*. 2014, appendix A.1), allowing investigators to better test the MSD patterns’ fit to a particular movement mechanism and model. The periodogram (Lomb 1976; Scargle 1982), which represents the autocorrelation structure in the frequency domain, can be used to identify cycles and periodicities in the movement. One might be inclined to skip nonparametric estimation and perform a model selection based upon likelihood and AIC. However, this assumes that all relevant models will be included in the AIC comparison (and in their relevant parameter regimes). Our previous analysis of Mongolian gazelles offers a good example against this practice, as our best movement model was invented only after noting significant discrepancy between the variogram and Ornstein–Uhlenbeck motion (Fleming *et al*. 2014).

The second step, which this work focuses on, is to select the best model and estimate model parameters. There are three classes of approaches to parametric autocorrelation estimation: weighted variogram regression (Cressie 1993), Chi-squared (*χ*^{2}) periodogram regression (Whittle 1953) and non-Markovian maximum likelihood (Table 1). Weighted variogram regression, which we used in Fleming *et al*. (2014), is the most accessible of the three methods, but it does not produce reliable confidence intervals (Fleming & Calabrese 2013). *χ*^{2} periodogram regression is the most computationally efficient of the three methods, but it only applies to evenly sampled data. Here, we derive the likelihood function for data of a specified mean and autocorrelation function, as given by a particular model of movement. Non-Markovian maximum likelihood accurately and efficiently uses limited data and is completely insensitive to the irregular sampling and data gapping often found in animal relocation data sets. We emphasize that in our approach there are no assumptions pertaining to the data quality (in terms of gaps), amount of data,1 or hidden Markov property (i.e. that one can difference the data enough times and rid it of autocorrelations).

Once a suitable modelling approach is selected, the autocorrelation estimate can then be applied to better answer movement-related questions, as any movement-related quantity of interest necessarily requires knowledge of the autocorrelation function (ACF) to make optimal estimates. We can compare characteristic time and length scales based on different behavioural movement modes identified by the best-fit model. Using the likelihood function, we can also estimate probabilities for different movement behaviours in individuals and groups, and even at different times, as with composite random walk models (Morales *et al*. 2004; Jonsen, Flemming & Myers 2005; McClintock *et al*. 2012). Finally, current strategies to estimate home ranges such as kernel-density estimates (KDE) (Worton 1989) and minimal convex polygons (MCP) (Southwood 2000) are incomplete because they do not adequately take autocorrelation into account, which results in the underestimation of home-range sizes (Fleming *et al*. 2014, appendix E). To be valid, these estimators require a coarsening of the data, which is then discarding information that a better estimator could use to improve the estimate (for an example of how small-scale details can improve a large scale estimate, see Appendix S4). Similarly, the problem of predicting an animal's location when it is not sampled (interpolation and forecasting) is nearly impossible without a probabilistic movement model that fully accounts for autocorrelation (see, for the spatial-field analogue, Krige 1951; Cressie 1993). All of these quantities require knowledge of the movement processes’ autocorrelation structure to be optimally estimated. Estimators that ignore the underlying autocorrelation structure can be heavily biased, resulting in over- or underestimation of the desired quantities. In Fig. 1, we summarize the principal analysis flow from the nonparametric estimation of autocorrelation structure, to the parametric estimation of model parameters (and model selection) and to the estimation of quantities of interest related to movement.

#### The case study

To demonstrate the power of a full non-Markovian likelihood analysis, we analyse 5 years of relocation data for Mongolian gazelles collected in the eastern steppe of Mongolia using GPS-Argos collars (Fleming *et al*. 2013). Mongolian gazelles are intermediate herbivores (Jiang *et al*. 2002) that live in one of the world's largest remaining grasslands and have previously been described as nomadic (Olson *et al*. 2010; Mueller *et al*. 2011). They exhibit long distance movements, and although their individual ranges are smaller than the available grassland (Fleming *et al*. 2014), they do not reside in small home ranges that are typical for range resident species. It takes months for the gazelles to cross their range, yet they do not undertake regular migrations, nor do gazelles exhibit any obvious cycles or periodicities (Fleming *et al*. 2014). While the likelihood method can easily handle complexities such as migrations and repeating (periodic) behaviours, with the gazelle we only have to address the autocorrelation function proper and not a time-dependent mean. We consider a family of models including Brownian motion with the possibility of home-range fidelity at long time-scales and a ballistic phase at short time-scales, which might be related to commuting or foraging behaviours.

Model identification for this species was previously performed in (Fleming *et al*. 2014) using variogram analysis. We repeat the same model selection here for demonstration purposes. However, we emphasize that, although maximum likelihood is the superior estimator, it does not yield any visual representation of the ACF, and so a variogram or periodogram analysis should always be the first step of analysis to ensure that all important movement behaviours are included in the modelling. Here in step two of our framework (Fig. 1), we employ maximum likelihood to obtain better parameter estimates and confidence intervals. The mean and ACF estimates can then be applied to calculating quantities of interest, which is what we do to estimate the individual- and population-level utilization distributions and 95% confidence region ranges.

To ensure a variety of sampling rates, individual collars were programmed for a pair of alternating collection intervals, with periods of 1, 5 and 25 h. Additionally, to preserve battery life, collars were programmed to have 10 day gaps after every 5 days of data collection. There was also data loss from malfunction; however, estimated telemetry errors were very small (details of the error estimates can be found in Fleming *et al*. 2014). It is important to note that this sampling schedule, which represents an attempt to balance fine-scale detail against collar longevity, is far from optimal and cannot be recommended. Specifically, the mix of different sampling rates and data gaps it introduced caused conventional variogram, periodogram and composite random walk methods to fail on this data set. After modifying conventional variogram analysis to handle such heterogeneity and data gaps (Fleming *et al*. 2014), we were limited to the analysis of the time-averaged and population-averaged isotropic (*x*–*y* averaged) autocorrelations. As a result, previously we could only make inferences about how an average gazelle moves in an average direction. However, the irregular sampling schedule and random gaps do not pose any problem for the non-Markovian likelihood function, and here, we can analyse individual movement paths and relax the isotropy assumption.

### Statistical framework and movement models

- Top of page
- Summary
- Introduction
- Statistical framework and movement models
- The likelihood function
- Analysis of gazelle movement
- Analysis of gazelle area utilization
- Discussion
- Acknowledgments
- Data accessibility
- References
- Supporting Information

The first two cumulants of the stochastic process provide the backbone for our calculations: the mean *μ*(*t*):

- (eqn 1)

and the autocorrelation function *σ*(*t*,*t*′):

- (eqn 2)

where *t* denotes the time index, **r** = (*x*,*y*) denotes the location, and 〈⋯〉 denotes the average over realizations of the process. Different movement models predict different functional forms for the first two cumulants. Our approach to fitting movement models to relocation data is to first calculate the model mean and ACF in terms of the movement parameters, such as the characteristic time and length scales, and then to maximize the likelihood of the location data with respect to these parameters. A good method of fitting the first two cumulants to the data is to assume a Gaussian distribution for the relocation time series. This produces standard formulas for method-of-moments estimates (Pawitan 2001), standard formulas for the periodogram in the case of stationary cumulants (Dembo 1986), an asymptotically normal estimator for any autoregressive moving-average (ARMA) model (Fan & Yao 2003), and it is the distribution of maximum entropy given no further knowledge of the higher-order cumulants (Cover & Thomas 2006). For time-correlated movement, this distributional assumption is fairly permissive, in that it restricts neither the step-length distribution (given a discrete sampling) nor the home range to be a Gaussian function, and can incorporate any number of movement behaviours.

The Gaussian probability density functional *P*[**r**] of a realized trajectory **r**(*t*) is formally given by

- (eqn 3)

This is a continuous generalization of the mutlivariate Gaussian probability distribution, where *t* acts as an index for the Gaussian random variable **r**(*t*), which is distinct from, yet correlated to, the Gaussian random variable **r**(*t*′) at a different time *t*′. The continuous trajectory is only sampled a countable number of times, and so we marginalize over all times for which data are not observed. This results in the marginal probability distribution of the observed time-series data

- (eqn 4)

in terms of the location data **R**, sampled mean **M** and sampled autocorrelation **Σ**, with components

- (eqn 5)

where the observed times *t*_{i} are indexed by *i* ∈ {1,2,3,⋯}. Arbitrary mean and autocorrelation functions can be considered for the stochastic process, which can incorporate various movement behaviours and parameters. The likelihood function does not care how we sampled the data nor if it contains gaps. The only mathematical requirement is that the autocorrelation function is positive definite:2 **Σ** must have all positive eigenvalues. Note that for *q* spatial dimensions and *n* sampled times, **R** and **M** are *n* × *q* dimensional, while **Σ** is *n* × *n* × *q* × *q* dimensional. These arrays must be properly flattened into *nq*-dimensional vectors and (*nq*) × (*nq*)-dimensional matrices to use conventional matrix operations. We will explicitly do this in the case of two dimensions.

Following probability distribution (4), the log-likelihood function is given by

- (eqn 6)

to within a constant. Similar likelihood functions have been derived in other contexts (e.g. see Mardia & Marshall 1984, for spatial fields). An AIC comparison or likelihood ratio test will then allow us to determine what behaviours, as expressed by parameters in the mean and autocorrelation function, are suggested by the data. In the family of models we consider, there are two movement processes that have been historically considered for movement processes: Brownian motion (BM) and Ornstein–Uhlenbeck (OU) motion. The OU process (Gardiner 2009) describes a random search within a defined area that grows more-and-more slowly in time and asymptotes to a finite value.

The OU autocorrelation function is given by

- (eqn 7)

where σ_{H} is the variance of the animal's utilization distribution, while *τ*_{H} is the time-scale in which the animal crosses this range. For small time-lags, *τ* = |*t*−*t*′| < *τ*_{H}, the diffusion is regular, and in the limit 1/*τ*_{H}0, this model reduces to Brownian motion with a diffusion rate given by the limit of σ_{H}/*τ*_{H}. For larger lags, *τ* > *τ*_{H}, the variance is asymptotically given by σ_{H} and so the displacement is subdiffusive.

In Fleming *et al*. (2014), the OU model was generalized by the inclusion of random periods of autocorrelated velocity (i.e. periods of time in which an animal tended to maintain its present rate of movement or lack thereof). These ballistic periods are characterized by a short time-scale *τ*_{F}, which were hypothesized to correspond to foraging behaviour, given that a grazing ungulate tends to move in a relatively straight line as it feeds (Bailey *et al*. 1996). However, this effect also arises from CRW behaviour at short time-scales (see Gurarie & Ovaskainen 2011, for the OU velocity process). The corresponding autocorrelation function is given by

- (eqn 8)

and the quantity σ_{F} ≡ (*τ*_{F}/*τ*_{H})σ_{H} then corresponds to the range associated with the autocorrelated movements, which could possibly be associated with a foraging range. We will refer to this model as the OUF process. The OUF model contains within it the Ornstein–Uhlenbeck (OU) model and within that the classical Brownian motion (BM) model. From BM to OU to OUF, the likelihood cannot decrease; however, each step adds one parameter. Like the OU process, for large lags, *τ* > *τ*_{H}, the variance is asymptotically constant, and for intermediate lags, *τ*_{F} < *τ* < *τ*_{H}, the diffusion is regular. For small lags, *τ* < *τ*_{F}, the displacement is technically superdiffusive, with the semi-variance function and mean squared displacement proportional to *τ*^{2}. However, the mean squared displacement in this regime is always less than that from the regular diffusion at intermediate lags, as the semi-variance and mean squared displacement strictly grow larger with time-lag. In the limit *τ*_{F}0, this model reduces to the OU model.

The BM, OU and OUF models all describe stationary processes, and yet ecological processes are assuredly non-stationary, as they are driven by diurnal and seasonal cycles. Although the mean and autocorrelation functions can be specified to incorporate non-stationary behaviours, this requires some additional effort. The easiest, though, least informative approach is to treat the non-stationarity as a nuisance parameter and average over all time dependence. This is effectively what is done when fitting a stationary model to non-stationary processes via maximum likelihood (Appendix S1). To construct specific parametric models of non-stationary behaviour, one can rely upon nonparametric estimates of the non-stationary mean and autocorrelation function, such as the population mean and Wigner–Ville function (Ville 1948; Hillery *et al*. 1984). Finally, we note that most ideas for addressing non-stationarity in movement ecology can be adapted to the formalism we propose, so that they better take into account the non-Markovian behaviours. In particular, behavioural change point analysis (BCPA, Gurarie et al., 2009) requires only a minimal substitution of likelihood functions.

### The likelihood function

- Top of page
- Summary
- Introduction
- Statistical framework and movement models
- The likelihood function
- Analysis of gazelle movement
- Analysis of gazelle area utilization
- Discussion
- Acknowledgments
- Data accessibility
- References
- Supporting Information

In the case of a two-dimensional stochastic movement process with isotropic correlations, the autocorrelation function can be represented

- (eqn 9)

where σ_{0} is the variance and *c*(*t*,*t*′) is a dimensionless, scalar correlation function – it carries no units of space or time and is not a matrix. E.g., *c*(*t*,*t*′) = exp (−|*t*−*t*′|/*τ*_{H}) for an OU process. Under this assumption, log-likelihood function (6) simplifies to

- (eqn 10)

to within a constant, where *n* is the number of times that the movement path is sampled (i.e. the number of relocations per individual animal), the correlation matrix **C** is defined by *C*_{ij} ≡ *c*(*t*_{i},*t*_{j}), and the sample correlations are given by

- (eqn 11)

- (eqn 12)

- (eqn 13)

in terms of the data and mean vectors

- (eqn 14)

- (eqn 15)

So for the OU process, the location data enter into the likelihood function via the vectors **X** and **Y** and the static mean μ_{x}, μ_{y}, variance σ_{0}, and range crossing time-scale *τ*_{H} would be the parameters allowed to vary in the likelihood function. Each time the parameter *τ*_{H} is varied, the matrix **C**, its inverse and its determinant must all be updated to evaluate the likelihood function.

The assumption of isotropy can be relaxed at the cost of additional model parameters, which can be justified by comparing AIC values or with a likelihood ratio test. We consider a two-dimensional process with anisotropic correlations that are uniform in time:

- (eqn 16)

Log-likelihood function (6) can then be expressed

- (eqn 17)

after some algebraic manipulations (using block-matrix inverse and determinant relations) to the multivariate Gaussian likelihood function. Given the likelihood function in equations (10) or (17), the mean (and its standard error) and the variance (and its standard error) can be solved (i.e. maximized or profiled) via some straightforward linear algebra (Appendix S2). This allows for model parameterization by standard maximum likelihood techniques and for model comparison via AIC or likelihood ratio test.

The non-Markovian likelihood functions we have introduced take into account an animal's autocorrelations over all possible time-lags, regardless of the model chosen and sampling rate employed. Our approach stands in contrast to conditional, Markovian likelihood functions that have been used in ecology to estimate the parameters of Brownian and OU movement processes (Dunn & Gipson 1977; Brillinger & Stewart 1998; Blackwell 2003; Horne *et al*. 2007; Johnson *et al*. 2008; Gurarie, Andrews & Laidre 2009). The Markovian likelihood functions are conditioned upon data differences and are constructed to be uncorrelated under the assumption of a BM or OU model. The Markov property vastly simplifies the resulting likelihood functions, which do not contain any matrix operations if only non-overlapping segments of time are considered. There are three significant drawbacks to this construction. First, the Markov property only applies under the assumption that the particular model is correct, and this assumption only holds when the model is a very accurate representation of the process that generated the data *at the specific sampling rate* for which the differences are calculated. In contrast to this, our more general non-Markovian likelihood function accounts for all possible lags in the data via the correlation matrix *C*_{ij} = *c*(*t*_{i},*t*_{j}), and therefore, it is much more robust to situations where the stochastic movement process is misspecified. Second, we prove in Appendix S3 that for any realistic movement process *x*(*t*) that has continuous and bounded velocities, with fine sampling rate Δ*t* = *t*_{i+1} − *t*_{i}, the diffusion rates and corresponding standard errors estimated by Markovian likelihood methods will be proportional to Δ*t*, which vanishes in the limit of continuously sampled data. Finally, by construction, the Markovian likelihood functions are limited to a specific order of continuous-time ARMA models. For instance, the Markovian likelihood function that conditions upon pairs of locations ℓ(*x*_{i},*x*_{j}) can be applied to compare the AIC values of the BM and OU models, but it cannot be applied to the OUF model, which requires location triplets ℓ(*x*_{i},*x*_{j},*x*_{k}). In contrast, the more general likelihood function can be applied to arbitrary mean and autocorrelation functions.

### Analysis of gazelle movement

- Top of page
- Summary
- Introduction
- Statistical framework and movement models
- The likelihood function
- Analysis of gazelle movement
- Analysis of gazelle area utilization
- Discussion
- Acknowledgments
- Data accessibility
- References
- Supporting Information

Overall the anisotropic individual-level OUF model was best supported by the data. We plot the likelihood profiles for this model in Fig. 3 and summarize the model fits in Table 2. Starting with the population-level isotropic OUF model, which was selected by variogram analysis, we fit the models to each of the gazelles. The gazelles had individual sample sizes ranging from 20 to 886 relocations. Allowing each gazelle to have independent time and length scale parameters, rather than a single set of scale parameters describing the entire population of gazelles, resulted in a further AIC decrease of 2690, indicating genuine variability in scales among the gazelles (Table 2). Compared to the population scales, the individual *τ*_{F} range up to a day and *τ*_{H} range down to tens of days (Fig. 3). This places the previous population-level results of variogram regression (Fleming *et al*. 2014; Table 2) within the range of individual variability. Allowing each gazelle to have an independent anisotropic covariance *σ*_{0} resulted in a further AIC decrease of 416, with very little effect on estimates of the time-scale parameters.

To repeat the model selection of our variogram analysis (Fleming *et al*. 2014), we also considered the OU and BM models. The isotropic OU model resulted in a marginally larger range crossing time *τ*_{H} estimate and was roughly consistent with some aspects of the data, while the BM model was found to be a very poor model of gazelle movement (Appendix S7). For comparison purposes, we also fit the data with the Markovian OU likelihood function that conditions only upon differences in the data, which corresponds to the techniques of Gurarie, Andrews & Laidre (2009). Relative to all non-Markovian estimates (including OU), this method resulted in an approximately 50% smaller range crossing time estimate *τ*_{H}, with grossly underestimated confidence intervals (Table 2). Moreover, as we prove in Appendix S3, in the limit of continuously sampled data, the Markovian likelihood function produces an estimate of *τ*_{H} that diverges. Ironically, the Markovian likelihood actually benefited in this case from the relatively infrequent sampling and excessive gapping in the data. This means that if one could improve the data quality by hand, perhaps by tracking down the collar and extracting data that was not uploaded to the satellite, then the Markovian likelihood estimate of *τ*_{H} would increase through the correct range of values and ultimately become arbitrarily large, yet with deceptively small confidence intervals.

We also performed a data simulation using the same sampling schedule, maximum likelihood (ML) parameter estimates and estimated GPS errors, as detailed and performed in Fleming *et al*. (2014). The results of an OUF fit showed marked improvement over the variogram regression and Markovian likelihood function, with the true values of all parameters being contained inside the 95% confidence intervals of the non-Markovian ML estimates. The simulations also served to demonstrate that *τ*_{H} was genuinely difficult to estimate with the given data schedule, even with the underlying model being correctly specified, as the confidence intervals for *τ*_{H} were as large with the simulated data as they were with the real data. This results from the fact that the period of the data samples, *T* = *t*_{n}−*t*_{1}, was not significantly larger than the true value of *τ*_{H}, as the effective sample size necessary to estimate a long time-scale *τ*_{c} in a single time series is roughly given by *T*/*τ*_{c}, and not by the total amount of data *n*.

### Analysis of gazelle area utilization

- Top of page
- Summary
- Introduction
- Statistical framework and movement models
- The likelihood function
- Analysis of gazelle movement
- Analysis of gazelle area utilization
- Discussion
- Acknowledgments
- Data accessibility
- References
- Supporting Information

Using the probability relationships outlined in Appendix S6, the population-level anisotropic OUF model resulted in a ranging area of *A*_{95%} = 77 600 ± 1200 km^{2}, which we plot in Fig. 4, with *A*_{P} referring to the most probable area in which the 36 gazelles in the sample population will be located a proportion *P* of all time. The breakdown of the individual area scales are given in Fig. 6 and the joint utilization distribution of the 36 collared gazelles is given in Fig. 2, resulting in a joint-distribution area of km^{2}. Obviously, the joint-distribution area of the population of ∼1·1 million gazelles will be even larger. However, the joint-distribution area of a mere 36 gazelles already almost completely fills the total available area of the Eastern Steppe, when bounded by the fenced borders to China and Russia, as well as the Trans-Mongolian railway, which is also fenced. Therefore, one can easily predict that an analysis featuring more individuals would indicate that this entire area of land is being utilized (Fig. 5), and so its encroachment would have a significant impact on gazelle behaviour.

For comparison purposes, we also fit the data with both Markovian and non-Markovian likelihood functions for the isotropic OU model. The non-Markovian likelihood function yielded a utilization range estimate similar to that of the OUF model, whereas the Markovian likelihood function resulted in a significantly smaller ranging area estimate of 30 120 ± 660 km^{2}. This mismatch is another indication of significant bias in the Markovian likelihood function, as even a bivariate Gaussian estimate (for an anisotropic 2D Gaussian distribution without autocorrelation) yields an area estimate of *A*_{95%} = 63 480 ± 500 km^{2}. In other words, the Markovian likelihood function has produced an area estimate which is even worse than the simplest possible analysis. By contrast, weighted variogram regression, which is also a convenient estimator, yielded a more reasonable estimate of *A*_{95%} = 91 000 ± 21 000 km^{2} (Fleming *et al*. 2014).

### Discussion

- Top of page
- Summary
- Introduction
- Statistical framework and movement models
- The likelihood function
- Analysis of gazelle movement
- Analysis of gazelle area utilization
- Discussion
- Acknowledgments
- Data accessibility
- References
- Supporting Information

We have introduced a rigorous method of modelling animal movement focusing on the first two cumulants of a stochastic process: the mean and autocorrelation function, which are sufficient to describe drift and diffusion (Adelman 1976; Calzetta, Roura & Verdaguer 2003). This method can incorporate a diverse range of behaviours, including home-range fidelity, searching and foraging. Using different movement models beyond the three discussed here (BM, OU and OUF) allows the approach to be extended to additional behaviour types, such as migration. We express the models by their mean and autocorrelation functions and then fit them to the data by maximizing the (non-Markovian) likelihood function. This novel approach to movement analysis allows us to identify mixtures of movement processes at the individual level, along with their corresponding characteristic time and length scales.

The flow of analysis we advocate (Fig. 1) involves three stages, with the likelihood method being central. In the first stage, one applies nonparametric estimators, such as variogram analysis (Fleming *et al*. 2014), to identify reasonable movement models and obtain rough estimates of the movement parameters. The identified movement models should be capable of explaining the statistically significant and visually apparent features of the mean, variogram and periodogram. In stage two, a likelihood analysis is used to obtain more accurate parameter estimates, reliable confidence intervals and a more efficient use of the data, allowing for fine-grained analyses such as the estimation of anisotropy and individual-level variability. In stage three, the best-fit mean and autocorrelation functions are applied to obtain reliable and statistically efficient estimates of quantities of interest that are conditioned upon movement, and therefore based upon the autocorrelation structure of the data.

There are four advantages of our likelihood approach over variogram regression: heightened accuracy; reliable confidence-interval estimation (Fleming & Calabrese 2013); robustness to irregularities in the sampling schedule; and the ease with which it can more naturally incorporate non-stationary movement behaviours, such as migration. However, the likelihood function requires that researchers identify a suitable choice of movement model. For this purpose, a nonparametric estimator, such as the variogram approach, remains useful as a model identification tool. Our analysis has confirmed the existence of multiscale movement processes suggested by the population-level variogram analysis of Fleming *et al*. (2014), from small-scale foraging to an asymptotic (multiyear) range (Table 2). The superior efficiency of maximum likelihood analysis allowed us to obtain parameter estimates at the individual level and in each of the spatial dimensions independently. This analysis revealed significant variability among individuals as well as pronounced directional bias in their movement.

Like the variogram method introduced in Fleming *et al*. (2014), our non-Markovian likelihood approach conditions upon all time-lags in the data, thereby avoiding the sampling problem inherent in currently standard movement analyses. For the conventional step-length and turn-angle analysis, such limitations are well known (Bovet & Benhamou 1988; Turchin 1998; Codling & Hill 2005; Gautestad 2012; Fleming *et al*. 2014). However, we have demonstrated that similar limitations also apply to Markovian likelihood functions that condition upon differenced data (see Appendix S3). In contrast to ad hoc estimators such as minimum convex polygons and kernel densities, the home-range and area utilization estimation provided by our likelihood analysis is inherently mechanistic, as it conditions upon the appropriate autocorrelation structure present in the data. Moreover, as the underlying model is stochastic and self-consistent, there is no fine tuning of ancillary parameters, such as a degree of refinement or bandwidth size. Compared to mechanistic home-range analysis (MHRA) (Moorcroft, Lewis & Crabtree 2006; Moorcroft & Lewis 2006), the parametric home-range estimator introduced here is limited because the stationary distributions of individuals will always be Gaussian. This stationary approach is inappropriate for migratory species. However, this limitation can be resolved through the use of non-stationary movement processes and related approaches, which we leave for a future effort. Two factors give our non-Markovian likelihood estimator an advantage over MHRA. First, MHRA requires a much more detailed understanding of the underlying movement mechanisms. Second, MHRA conditions upon advection-diffusion equations, which are inherently incapable of generating the multitime correlations that can be generated by Langevin equations (Calzetta, Roura & Verdaguer 2003), which are the stochastic (and possibly integro-differential) equations of motion for the animal trajectories. The limitation implies that the larger class of non-Markovian processes, and OUF in particular, cannot be modelled with MHRA, at present. On the other hand, the MHRA has a more detailed and mechanistic description of the movement process, and so it allows for the prediction of movement behaviours under hypothetical conditions.

The likelihood analysis is efficient and robust enough to fit to each individual gazelle, which paves the way for many future analyses that require individual-level estimates of movement path autocorrelation structure. Quantities of interest that require such estimates (Fig. 1) include optimal (nonparametric) home-range estimation, interpolation and forecasting of animal locations, path length and instantaneous velocity estimation, and transitions between behavioural states. The individual-level analysis, which was precluded in the variogram analysis by too little data (Fleming *et al*. 2014), also enables the exploration of hypotheses on the relationships between different characteristic movement scales. Although no obvious trends presented themselves in Figs 3 and 6, any relation between time-scales, length scales, diffusion or rate scales can be tested with the likelihood function. It is clear from our analysis that the population scales are well defined by a single global maximum in the likelihood profile, but the individual scales vary significantly within the population (Table 2). This naturally suggests the hypothesis that some of the variability in movement behaviour among individuals may be attributable to variability in the environment. We will explore this idea in the future using individual-level analyses that relate movement processes to environmental covariates.