Changing measurements or changing movements? Sampling scale and movement model identifiability across generations of biologging technology

1. Animal movement patterns contribute to our understanding of variation in breeding success and survival of individuals, and the implications for population dynamics. 2. Over time, sensor technology for measuring movement patterns has improved. Although older technologies may be rendered obsolete, the existing data are still valuable, especially if new and old data can be compared to test whether a behaviour has changed over time. 3. We used simulated data to assess the ability to quantify and correctly identify patterns of seabird flight lengths under observational regimes used in successive generations of tracking technology. 4. Care must be taken when comparing data collected at differing time-scales, even when using inference procedures that incorporate the observational process, as model selection and parameter estimation may be biased. In practice, comparisons may only be valid when degrading all data to match the lowest resolution in a set. 5. Changes in tracking technology that lead to aggregation of measurements at different temporal scales make comparisons challenging. We therefore urge ecologists to use synthetic data to assess whether accurate parameter estimation is possible for models comparing disparate data sets before conducting analyses such as responses to environmental changes or the assessment of management actions.

Over recent decades, advances in animal tracking and biologging technology have provided an enormous amount of increasingly precise measurements of animal movement paths (Block et al., 2011;Hays et al., 2016;Hussey et al., 2015;Kays, Crofoot, Jetz, & Wikelski, 2015;Phillips, Croxall, Silk, & Briggs, 2007). Researchers use satellite transmitters or data loggers to collect location information; these provide data at intervals which are often constrained by battery power or memory capacity (Edwards et al., 2007;Fedak, Lovell, McConnell, & Hunter, 2002;Shillinger et al., 2012). In large birds and mammals, particularly in recent years, the deployment of GPS loggers, and, in marine animals, of saltwater immersion or temperature-depth loggers, have generated a wealth of tracking data at high temporal resolution and at relatively low cost (Block et al., 2011;Mackley et al., 2010;Scales et al., 2016). However, data collected previously using VHF and satellite transmitters (platform terminal transmitters or PTTs), older GPS and immersion loggers, or by human observers are also available, albeit at coarser spatial and/or temporal resolution (Edwards et al., 2007;Froy et al., 2015). The increasing use of tracking and biologging technology has also been accompanied by initiatives to archive, share, and exchange animal tracking data (Birdlife International, 2004;Kranstauber et al., 2011). This wealth of existing data creates opportunities for informative comparisons between archived and new behavioral data.
However, the different recording resolutions add complications both in terms of methodology and interpretation.
With the burgeoning of biologging and other ecological research, detailed observations are now available that span a time period that is relevant to the temporal scales of demographic processes, even for long-lived animals, as well as changes in the Earth's climate (Crossin et al., 2014;Hazen et al., 2013). New research avenues have therefore opened for using biologging data to study how movement patterns may be changing across time, including in response to environmental variation (Hays et al., 2016). Most studies that deploy tracking devices on animals, such as seabirds, are usually aimed at answering broad ecological questions about habitat use and foraging behavior in one or a few successive years, as opposed to describing patterns of movement across time frames longer than a decade (but see Bogdanova et al., 2014;Carneiro et al., 2016). Consequently, device sampling intervals may be suboptimal for learning about movement over the longer term in post hoc studies. Assessing whether movement strategies have changed requires robust methods and movement models that allow the synthesis of data sets collected at different temporal scales, with differing accuracy, and often with different research aims at the outset.
There are many ways to describe and quantify movement patterns of animals that depend on the type and quality of available data.
Many models of foraging assume that organisms move diffusively, that is, that animals perform uncorrelated Brownian walks as they search for food (Johnson, Wiens, Milne, & Crist, 1992). However, for most animals, the Brownian assumption is clearly inadequate (Turchin, 1998). Superdiffusive descriptions of movement, such as Lévy walks or flights (Shlesinger, Zaslavsky, & Frisch, 1995;Viswanathan, 2010;Watkins et al., 2005) or intermittent search strategies (Bénichou, Loverdo, Moreau, & Voiturz, 2006, which describe movement as small jumps interspersed with occasional longer jumps, are popular alternatives to standard diffusion models as they allow for more complex patterns. Lévy walks, which model movements with step lengths determined by a power-law distribution, were first applied in ecology to describe the foraging strategies of wandering albatrosses, Diomedea exulans (Viswanathan et al., 1996), and have since been used to describe search or foraging strategies across many different biological systems (e.g., see references in Edwards et al., 2007). They were also shown theoretically to represent optimal search strategies for revisitable targets when the targets are fractally distributed (Viswanathan et al., 1999). However, the validity of Lévy flights as descriptions of animal movement foraging is hotly debated in the ecological literature (Auger-Methe, St Clair, Lewis, & Derocher, 2011;Buchanan, 2008;Edwards et al., 2007;Humphries et al., 2010;Reynolds, 2012;Travis, 2007;Viswanathan, Da Luz, Raposo, & Stanley, 2011). For instance, although the initial study on albatrosses indicated a Lévy pattern of foraging (Viswanathan et al., 1996), after correcting and augmenting the original data, and utilizing improved statistical methods, a later study by Edwards et al. (2007) showed that the Lévy flight model was not supported; instead, flight times were more likely to be gamma-distributed.
Subsequent studies claim new evidence for Lévy-like behavior in certain marine predators (Focardi & Cecere, 2014;Hays et al., 2012;Humphries et al., 2010;Reynolds, Paiva, Cecere, & Focardi, 2016;Sims, Humphries, Bradford, & Bruce, 2012;Sims et al., 2008), and that humans exhibit more complex behaviors (González, Hidalgo, & Barabási, 2008). Further studies on albatrosses have concluded that foraging patterns of some (although far from all) individuals are well described by modified Lévy flights or Brownian movement in various contexts, and further concluded that birds utilizing this method are able to consume considerably more prey than they need to satisfy their own energy requirements (Humphries, Weimerskirch, Queiroz, Southall, & Sims, 2012). Thus the evidence on Lévy flights in nature is decidedly mixed.
In this study, we use foraging data from albatross species collected a decade apart to explore how the changes in logger technology (and hence the scale and mode of sampling), modeled distributions (statistical fitting), and the treatment of both data and distributions may influence the findings and our ability to infer and compare behavior over time. Our analyses focus on a particular type of data from loggers which detect and record saltwater immersion, providing information on wet and dry periods (so called immersion loggers; Edwards et al., 2007;Mackley et al., 2010).
Although a geographic location was not available in some of the earlier deployments, PTT or GPS data have been collected concurrently with the immersion data in the past 20 years, providing improved insights into movements and habitat use (Carneiro et al., 2016;Mackley et al., 2010;Scales et al., 2016). Previously, a major consideration was memory capacity, which led to alternative ways of sampling and storing data, which were aggregated at different timescales on the device during the deployment. The aims of our study were to evaluate model and parameter identifiability for different generations of immersion loggers, using synthetic data sets reflecting different sampling regimes. We further investigated parameter estimation for actual data collected in the wild.

| METHODS
In order to determine to what extent the inference of underlying foraging patterns is influenced by the data collection method, we combine a simulation study with an analysis of two suites of data on flights and water landings at sea collected a decade apart, 1992-1993 and 2002-2004, from wandering (D. exulans) and black-browed albatrosses (Thalassarche melanophris). As these data comprise segments of behaviors that have variously been referred to as steps, trips, tracks, flights, etc., a glossary is provided in Table 1.

| Inference procedure
All flights are assumed to come from one of four possible distributions: (shifted) exponential; (shifted) gamma; (shifted) q-exponential; pareto. Details of the distributions are given in Appendix S2. These true flights are then resampled with (real or virtual) data loggers that discretize or aggregate the flights.
We use two approaches to infer the parameters of the underlying process from the data. One is to take a "naive" maximum likelihood approach, that is, ignore the observational process, and instead assume that the observed data are drawn without noise from the underlying distribution. Another approach is to use a multinomial maximum likelihood approach that explicitly models the observational process (Edwards et al., 2007). In this case, the log-likelihood of the parameters θ, given a record r (a set of observations, see Table 1) One assumption of this multinomial model is that there is some biological lower limit to the possible flight in terms of the length of time spent dry. For instance, if a bird extended its foot out of the water to scratch its head, the logger would record that event as a dry interval; however, ideally, these events would be excluded from any analysis of flights. Following Edwards et al. (2007); Reynolds et al. (2016) and others, we use a lower limit of flight duration (part of an overall trip) of 30 s, on the biological assumption that this is not likely to be a flight to a different food patch. This lower limit to flight time is built into the exponential, gamma, and q-exponential distributions as a shift, and into the pareto as the lower set point (see Appendix S2 for details).

| Simulation studies
We used a suite of simulations to explore the effect of the different logger sampling schemes (specifically the timescales over which data are aggregated) on our ability to correctly infer parameters values of a known model and to choose the true model if we treat it as unknown.
First, we generated a series of "true" flights drawn directly from the known distributions without the observation process. For each of the four distributions, we specified four parameter sets for a total of 16 underlying flight-time distributions. When possible, we chose parameters so that the theoretical means between the four sets of parameters corresponded between distributions, to ensure that the scales of the processes were comparable. For each of the 16 flight distributions, we created 10 simulated data sets of length 3,000 (i.e., 10 sets of 3,000 flights). (1)

Term Definition
Trip A trip is assumed to be one foraging excursion, beginning when the animal leaves the nest site and ending when it returns. A trip is comprised of flights interspersed with (water) landings Flight Flights are the subcomponents of a trip, the units of space or time between prey capture attempts, in which the bird is actively flying Step In tracking studies of terrestrial animals, this is more commonly used to describe distance, rather than time, and again, represents the sub-unit of a trip. Here, we use interchangeably with flight Segment A discrete time unit over which the wet/dry status of the bird is measured. These segments may be aggregated into longer intervals

Interval
The period over which aggregation of one or more wet/dry segments occurs. In the interval, the number of wet and dry segments are recorded. Flights are comprised of integer numbers of consecutive completely dry intervals. For data at high (time) resolution, the segment and interval timescales may be the same

Record
The counts of flight lengths (in intervals) within or across trips T A B L E 1 A glossary of terms describing movement paths used in this study For each of the 160 simulated data sets, we then "observed" the data using our two most extreme sampling regimes, that is intervals of either 1 hr or 30 s corresponding to the sampling intervals used in field deployments in 1992 and 2004, respectively (Edwards et al., 2007). More details of the algorithm are in section 3 in Appendix S3. This resulted in 320 simulated data sets of length ≤3,000 (as the aggregation can result in a subset of flights being labeled as nonflights and thus discarded).
These simulated data were used in the following two simulation studies.

| Parameter identifiability
Using the 320 simulated data sets, we attempted to infer the parameters from the underlying flight-time model corresponding to the one that generated the data. We used both a naive maximum likelihood estimate (MLE; i.e., one that excluded the observational process) and the multinomial with the appropriate observational process. For instance, if the underlying model was an exponential, we fit the exponential model, only. The inferred parameters were then compared to the true parameters that generated the flights.

| Model identifiability
Using the 320 simulated data sets, we fit all four of the possible flight distribution models using the exact (multinomial) likelihood. For each of the 320 data sets, we calculated the Akaike information criterion (AIC, Akaike, 1973;Bozdogan, 1987), the Bayesian information criterion (BIC, Raftery, 1986;Schwarz et al., 1978), and the approximate model probabilities based on BIC [Burnham & Anderson, 2004 and given by Equation (2), below], and used these to select the best model for each data set.

| Immersion data analysis
Observational data on flights and water landings were obtained from immersion loggers deployed on the legs of individual wandering albatrosses and black-browed albatrosses from Bird Island, South Georgia (54°00′S, 38°03′W). Multiple types of loggers were deployed between 1992 and 2004. The data are summarized in Table 2.

| Flight length calculation from immersion data
Wet/dry records were parsed at the highest temporal resolution for each type of logging device, before flight lengths were calculated by merging consecutive time periods recorded as dry (Appendix S1).

| Model fitting and model selection
All four flight distributions under the multinomial likelihood with appropriate discretization and aggregation parameters were fit to all sets of data noted in Table 2. Models for each data set were ranked via BIC. Further, approximate model probabilities p(M i ) (based on BIC, Burnham & Anderson, 2004) were calculated as: where R is the total number of models being considered, and BIC min is the minimum BIC value across those models. The second expression is more numerically stable, and so is the one we use in our calculations.

| Comparing the flight-length distributions between years and species
After selecting the best fitting model via BIC, we examine model fit by plotting the theoretical quantiles versus the observed data quantiles.
We then compared the estimates of parameters for our three data sets. In the current likelihood framework, we obtain point estimates for all parameters. We can then use these parameters to estimate the means/medians and variances among the fitted models.

| Parameter identifiability
Using the 320 simulated data sets, we attempted to infer the parameters from the flight-time model corresponding to that which generated the data using both the naive likelihood (excluding observational process) and the exact multinomial likelihood with the appropriate observational process. Results for the exact likelihood are shown in Figure 1 and the naive likelihood in Figure 2.
Overall, parameter estimates were much more precise for the data recorded at high frequency (i.e., 10 s sampling), regardless of the underlying true distribution or the scale of the true process. This is because the true lengths of dry periods are recorded with high resolution when data are recorded at this high frequency. However, even for the T A B L E 2 Overview of immersion logger data sets used in this study higher resolution data, using the naive likelihood can bias parameter estimates for some cases of the Pareto and q-exponential distributions. This is probably because even the small amount of truncation of dry periods of 30 s or less changes the expected ratio of small flights to longer flights, biasing the estimates. Using the exact likelihood helps to account for these shifts.
In contrast, estimating parameters for a 1 hr integration step is difficult even when using the exact likelihood if the mean/median flight times are on the order (or less) of the integration period (i.e., if true flight times are less than ~5 hr in our simulations). Again, use of the exact likelihood can provide better results, although for the Pareto and q-exponential, both approaches perform poorly if the F I G U R E 1 Back-estimation of simulated step-length data sets parameters under emulated sampling regimes of two wet/dry activity logger models using exact likelihood. Bars indicate ranges of parameter estimates F I G U R E 2 Back-estimation of simulated step-length data sets parameters under emulated sampling regimes of two wet/dry activity logger models using naive likelihood. Bars indicate ranges of parameter estimates true median flight duration is short and the integration interval is long.

| Model identifiability
Using the 320 simulated data sets, we fit all four of the possible flight distribution models. Focusing on the results from the multinomial approach applied to the appropriate observational process. In Figure 3,

| Flight-length calculation from immersion data
Across both species and irrespective of the observation regime, the model that is most consistent with the observed data is the gamma distribution (Table 3). This is in line with previous results on a subset of Here, we show the difference in BIC from the best performing model (Δ BIC), such that the best model has a value of 0. We also show the calculated model probabilities, based on Equation (2). Data set identifiers correspond to Table 2. the data (Edwards et al., 2007). Based on gamma Q-Q plots (Figure 4) for all three of the data sets, the fitted gamma distributions appear to be reasonable for data both from black-browed albatrosses in 2002 and wandering albatrosses in 2004, although both exhibit heavier tails than would be expected from the gamma distribution. The fit for the data from wandering albatrosses in 1998 is much poorer and is underestimating the number of short flights. Because the fit is relatively poor, directly assessing whether or not the patterns are consistent across time periods, or comparing between species, should be approached with care (Table 4).

| DISCUSSION
Quantification of the movements of animals, such as seabirds, provides insights into foraging and migration behavior, the underlying drivers of movement, how movement and behavior may change over time in response to these drivers, and the consequences for individual performance and population dynamics (Crossin et al., 2014;Hays et al., 2016). The continuing development of new biologging technology for monitoring animal movement has greatly increased the resolution and quality of the data available, increased sample sizes, and reduced the effort required in the field, particularly for obtaining long time series.
These data represent invaluable archives for reconstructing historical movement patterns of animals for comparison with more recent observations. They provide a window into the past for understanding animal movements and the influence of changing environmental conditions, including the abundance and distribution of prey (Pereira, Paiva, & Xavier, 2017;Seco et al., 2016). As animal movement databases grow (Birdlife International, 2004;Kranstauber et al., 2011), so do the opportunities for historical comparisons. However, as the resolution and accuracy of tracking devices have changed over time, these comparisons must be made with care to ensure robust interpretation.
In this study, we used simulated data to assess the ability to quantify and correctly identify patterns of flight lengths under observational regimes that correspond to the range of older and more recent immersion logger technology for recording landings of foraging seabirds at sea. These simulation experiments are the optimal approach for evaluating new statistical methods-if it is impossible to reconstruct the true parameters from a known distribution, then the inference method will almost certainly be unreliable when applied to experimental or observational data. Furthermore, this approach allows us to examine the impact of the observational method on the resulting conclusions, and to identify ways of comparing and combining disparate data sets to maximize their value.
Using simulated data from a set of four underlying flight-length distributions that have been hypothesized to describe the flight distributions of seabirds (exponential, gamma, Pareto (corresponding to a Lévy flight), q-exponential) that are then "observed" using a sampling regime typical of immersion loggers deployed in the field, we were able F I G U R E 4 Gamma Q-Q plots to assess model fit to the observational data. Data set identifiers correspond to to test the effectiveness of our statistical methods. In particular, we focused on the extent to which incorporating truncation and aggregation, which are part of the sampling procedure, into the likelihood estimation was necessary in order to determine accurate parameter values and correctly identify the underlying model. The results indicated that the inference procedure using the exact likelihood performs as well or better than the naive likelihood in all cases, that is, accounting for the observation process improves our ability to both identify the model and estimate parameters. This improvement comes with a computational cost, as evaluating the exact likelihood is slower, and some tuning of the maximization procedure is required for individual data sets to achieve convergence. For low-and medium-resolution data (or if a Pareto distribution is considered, regardless of the resolution), the computational costs are worthwhile. For the very high-resolution data (at least every 10 s) available from loggers in recent years, it may be sufficient to use the naive likelihood, with no need to incorporate the observational process.
Even when using the likelihood that incorporates the observational process, care must be taken when analyzing data that have been aggregated at timescales that are much longer than the events of interest. In these cases, the parameters can be significantly biased, and a model different from that used to generate the data may be chosen as the best model. In practice, it may therefore be impossible to accurately compare flight patterns from loggers that provide aggregated data at coarse scales from those that provide fine scale data. Instead, the latter may need to be degraded (reaggregated) to the coarser scale to determine whether patterns from the two regimes are at least consistent, even if it is not possible to determine whether the parameters are the same. This also puts a constraint on the biological questions that may be compared between data taken at different resolutions.
For instance, short scale inferences about foraging intervals within a food patch may be unreliable, whereas inferences about longer scale movement between patches may be accessible.
Based on the simulated data, we were not able to identify the process model underlying the "observations" with the coarse sampling regime consistently and accurately. Thus, we must be cautious when attempting to infer whether or not this particular aspect of the foraging strategy of the albatross has changed over the past two decades based the type of data at hand. Even for the higher temporal resolution data that we present here, the lack of model fit indicated by the Q-Q plots (Figure 4) is concerning. In particular, there are more long flights than would be typical for the best fitting gamma model. The question is why would this be the case? In some cases, where concurrent location data are available, we may be able to determine that some longer flights may not represent foraging behaviors, and can be excluded. This was the case for a proportion of the data for which we had concurrent location data.
The longer flights could also indicate individual birds that are not exhibiting foraging behavior (for instance attempting to fly out of a storm). In the current analysis, we have treated the behavioral state as known, such that the wet status corresponds to feeding/handling attempts and dry to flying foraging. Thus, we have not utilized a more complex statistical approach, such as state-space modeling (Patterson, Thomas, Wilcox, Ovaskainen, & Matthiopoulos, 2008) that can allow concurrent estimation of behavioral state. If the observed longer flight patterns are a result of a separate nonforaging flying, these methods may be useful for identifying them.
Another possibility is that the mismatch between the data and the models is a symptom of interindividual differences in behavior, or otherwise more complex behavior than the simple models here allow. If this is the case, improving the models themselves, as well as the statistical techniques to analyze them, will be a more fruitful way forward.
Even in the case of developing new models for the underlying behavior, it may be that direct parameterization of all model components is not possible. Instead, the quantified patterns explored here could be directly compared with model outputs, for instance emergent flight lengths from an optimal foraging or individual-based model. Although model parameterization and validation of more complex models can be challenging, they can allow us to better understand why we see particular patterns and to better predict how behavior may change into the future.