Animal movement tools (amt): R package for managing tracking data and conducting habitat selection analyses

Abstract Advances in tracking technology have led to an exponential increase in animal location data, greatly enhancing our ability to address interesting questions in movement ecology, but also presenting new challenges related to data management and analysis. Step‐selection functions (SSFs) are commonly used to link environmental covariates to animal location data collected at fine temporal resolution. SSFs are estimated by comparing observed steps connecting successive animal locations to random steps, using a likelihood equivalent of a Cox proportional hazards model. By using common statistical distributions to model step length and turn angle distributions, and including habitat‐ and movement‐related covariates (functions of distances between points, angular deviations), it is possible to make inference regarding habitat selection and movement processes or to control one process while investigating the other. The fitted model can also be used to estimate utilization distributions and mechanistic home ranges. Here, we present the R package amt (animal movement tools) that allows users to fit SSFs to data and to simulate space use of animals from fitted models. The amt package also provides tools for managing telemetry data. Using fisher (Pekania pennanti) data as a case study, we illustrate a four‐step approach to the analysis of animal movement data, consisting of data management, exploratory data analysis, fitting of models, and simulating from fitted models.


Introduction
Advances in technology have led to large collections of fine-scale animal biotelemetry data (Cagnacci et al., 2010;Kays et al., 2015), fueling the development of new quantitative methods for studying animal movement (Hooten et al., 2017).Nathan et al. (2008) introduced the movement ecology paradigm, that conceptually connects different factors shaping the realized movement path of animals (e.g., the internal state of an animal, interaction with intra-and conspecifics, and varying environmental conditions).The movement ecology paradigm can serve as a framework for generating new hypotheses about animal movements.To test these hypotheses, efficient and straightforward tools for the management and analyses of movement data are required.Although a large number of R packages have been developed for analyzing animal movement data (e.g., Calabrese et al., 2016;Gurarie et al., 2009;Michelot et al., 2016) these packages often utilize domain-specific data formats and focus on a narrow subset of analytical methods (e.g., methods for fitting discrete or continuous time movement models, trajectory segmentation).We had two primary objectives in developing the amt R-package, namely to provide: 1) a set of functions for exploratory analyses of movement data in R, and 2) functions that facilitate the analysis of fine-scale animal location data using Step-Selection Functions (SSFs).
Step-Selection Functions are powerful tools for modeling animal movement and habitat selection, but are not currently available in open-source software packages, despite their popularity.
Methods that quantify habitat selection by linking environmental covariates to location data of animals have been around for a long time.Traditionally Resource Selection Functions (RSF; Boyce & McDonald, 1999;Manly et al., 2007) were used to study habitat selection of animals.RSFs compare covariates associated with locations where the animal was observed with covariates associated with random locations within the 'availability domain', a spatial domain within which any location is assumed available for the animal to use at any given time.Despite the sensitivity of the resulting inference to habitat availability (Beyer et al., 2010), no consensus exists as to the most suitable approach to delineate the spatial domain of availability (Northrup et al., 2013;Paton & Matthiopoulos, 2016;Prokopenko et al., 2017b).Moreover, the assumption that the availability domain can be considered temporally static might have been justifiable for very coarse sampling rates (e.g., daily or weekly positions of the animal), but is challenging for modern GPS data with sampling rates < 1 hour.Step-Selection Functions (SSFs;Fortin et al., 2005;Thurfjell et al., 2014) resolve these issues by pairing each observed location with a set of random locations deemed accessible from the previously observed location.
Step-Selection Functions estimate conditional selection coefficients using a likelihood equivalent of a Cox proportional hazards model (Gail et al., 1981).
Until recently, SSFs were fitted by sampling random points based on the empirical (observed) distribution of 'steps' (straight lines connecting consecutive locations).This approach has come under some scrutiny as it implicitly assumes habitat selection is conditional on animal movement but not vice versa, potentially leading to biased inference (Forester et al., 2009).A recent extension, termed integrated SSF (iSSF), alleviates this concern and allows for simultaneous inference of habitat selection and movement processes (Avgar et al., 2016).This is accomplished by requiring that random steps are sampled under one of several analytical distributions, and also by including, in addition to habitat-related covariates, movement-related covariates (functions of distances between points, angular deviations) resulting in likelihood-based estimates of the shape and scale of the underlying analytical distributions (Avgar et al., 2016;Duchesne et al., 2015;Forester et al., 2009).Unlike SSFs (that do not include an explicit movement component), a fitted iSSF is a fully-fledged biased random walk model that can be used to simulate animal space-use (Duchesne et al., 2015;Avgar et al., 2016;Signer et al., 2017).
Step-selection functions (SSFs and iSSFs) are usually straight forward to fit (using any conditional-logistic regression routine) once data are appropriately structured, but data preparation itself tends to be more complex and confusing and may thus become a limiting step in the application of this approach.Here, we describe the amt package for R, which provides a flexible and coherent workflow for efficient analysis of animal tracking data.We make heavy use of piped workflows and list-columns as introduced to R through the tidyverse package-family (Wickham, 2017).We illustrate a typical workflow for fitting a (i)SSF using fisher (Pekania pennanti ) data from LaPoint et al. (2013a).Detailed vignettes, help files, sample data and analyses are available in the amt package available on CRAN (https://cran.r-project.org/package=amt)

Functionality
A typical workflow to analyze animal tracking data can be divided into four main steps (described in detail below): 1. Data preparation, inspection and management: Load and inspect gaps in the data, resample tracks if needed, and adjust coordinate reference systems.
2. Exploratory data analysis and descriptive analyses: Explore patterns in the data graphically, consider multiple movement characteristics (e.g., step-length distribution, net square displacement, or home-range size) across several animals and/or time periods.

Modeling:
Fit models to answer questions or test hypothesis related to movement and space use of animals.
4. Simulation: Use fitted models to simulate derived quantities (e.g., space use) and assess model fit.
Data preparation, inspection, and management After loading data into R, users should perform a variety of data quality checks and possibly remove fixes with missing coordinates (although this information could potentially be used to test if fixes are missing at random).We provide functions to quantify variability in sampling rates over time and among individuals, inspect the data visually for obvious outliers (e.g., determined by screening for unreasonable speeds), remove periods at the beginning and the end of the track to exclude possible capture effects, and resample the data to form regular bursts (i.e., partition the track into groups of observations with regular sampling rates, within some specified level of tolerance).
Exploratory data analysis and descriptive analyses Once data have been cleaned, the next logical step is to explore the data by looking at different movement-related statistics (e.g., distributions of turning angles or step lengths) and trajectory and space-use summaries (e.g., net squared displacement, path sinuosity, home range area).These summaries may be calculated for the whole trajectory or on a subset of points (a track might be split by time of the day, season, year or any other biologically meaningful factor).
Modeling In the next step, we fit models to test hypotheses about animal movement and habitat selection.Importantly, amt provides functionality necessary for data development steps prior to fitting RSFs and (i)SSFs (e.g., methods for generating random points or steps, and extract environmental covariates for the observed and random steps).For many other analyses (e.g., behavioral change point analyses, fitting continuous time movement models or identification of hidden behavioral states with hidden Markov models), amt provides coercion functions to translate location data into objects of classes required by the respective packages.
Simulation As a final step, new data can be simulated from fitted models.Simulations can be used to obtain estimates of space use (i.e. the utilization distribution), identify corridors of high use, or asses the power of the model (testing how well parameters can be recovered as a function of sample size).Many packages that fit models also provide methods to simulate from fitted models (e.g., ctmm or moveHMM).amt provides means to simulate space use from fitted SSFs.

Case study
We illustrate a subset of the above steps using data from radio collard fishers available through movebank (LaPoint et al., 2013a,b).For details about the data and the capture of the animals, we refer to Brown et al. (2012) andLaPoint et al. (2013a).We begin by analyzing the space use of Ricky T (id 1016), and then illustrate how similar analyses can be extended to several animals for population-level inference (Fieberg et al., 2010).
From data cleaning to simulated space use for one animal We begin with loading the data of all fishers, remove observations with missing spatial coordinates (longitude, latitude), and subset relocations for Ricky T (id: 1016).
dat_1 <-mk_track(dat_1, .x= x, .y= y, .t=t, crs = sp::CRS("+init=epsg:4326")) %>% transform_coords(sp::CRS("+init=epsg:5070")) We then summarize the distribution of time intervals between successive locations to get a general impression for the sampling rate.We see that we have 8957 total locations, the shortest interval between locations is 0.1 minutes and the largest time interval between locations is 1208 minutes, with median interval length equal to roughly 2 min.Despite the 2 min temporal resolution, we choose to resample the track to 10 min with a tolerance of 1 min (track resample), in order to conduct the analyses on the same temporal scale as the next example.The function minutes from the package lubridate (Grolemund & Wickham, 2011), is used here to create an object of class Period that is then passed to track resample.Periods can be specified using all common time units, thus it is straightforward to specify sampling rate and an acceptable tolerance.We will also choose to keep only those bursts (subsets of the track with constant sampling rate, within the specified tolerance) with at least three relocations, the minimum required to calculate a turn angle (filter min n burst).The following code implements those choices, and translates from a point representation to a step (step length, turn angle) representation of the data.In the final line of the code snippet, we use the function time of day (a wrapper around maptools::sunriset and maptools::crepuscule; Bivand & Lewin-Koh, 2017) to calculate if a location was taken during the day or night.If the argument include.crepuscule is set to TRUE, the function not only considers day and night, but also dawn and dusk.We then use the str function to inspect the structure of stps.stps is a regular data frame with 11 attributes of steps (e.g., start, end, and step length; columns) and 1501 steps (rows).For each step, the start (x1 , y1 ) and end (x2 , y2 ) coordinates, as well as the start and end time (t1 , t2 ) are given.In addition the following derived quantities are calculated: step length (sl ; in CRS units), turning angles (ta ; in degrees; notice that it cannot be calculated for steps that are not preceded by a valid step), the time difference (dt ), and the burst (burst ) to which the step belongs.We proceed by preparing the environmental data.We hypothesized that Ricky T prefers forested wetlands over other landuse classes.We used the National Landcover Database (which is freely available at https://www.mrlc.gov/nlcd11_data.php).We first load the landuse raster and create a layer called wet that is 1 for forested wetlands (category 90) and 0 otherwise (using the raster package; Hijmans, 2017).land_use <-raster("data/landuse_study_area.tif")wet <-land_use == 90 For convenience and readability, we give the layer a meaningful name.

names(wet) <-"wet"
Before proceeding to modeling space use and habitat selection of Ricky T, we perform some exploratory data analysis based on step length and turning angles in different habitat types (wet forest versus other areas) and time of the day (day and night).We will have to extract the covariate value at the start point of each step (using the function extract covariates) and plot the density of step lengths per habitat class and time of day (Fig. 1; for the full code to replicate Fig. 1 see Supplement S1).Note that the function extract covariates takes an argument where, that indicates whether covariate values should be extracted at the beginning or the end of a step (both is also possible to extract the covariate at the start and the end of a step).Depending on the target process under investigation (habitat selection or movement), covariates might be extracted at the end of the step (habitat selection process) or at the start of the step (movement process).If covariates are extracted at the end of the step, they are typically included in the model as main effects, to answer questions of the type: How do covariates influence where the animal moves?In contrary, if covariates are extracted at the beginning of the step, they are typically included in the model as an interaction with movement characteristics (step length, log of the step length, or the cosine of the turn angle), to test hypotheses of the type: Do animals move faster/more directed, if they start in a given habitat?Finally, covariate values at the start and the end of a step can also be included in the model as interaction with each other, to test hypotheses of the type: Are animals more likely to stay in a given habitat, if they are already in the habitat?
In order to fit SSFs, the observed covariates associated with observed steps are compared to covariates associated with random steps.Random steps can be generated by either 1) sampling from the observed turn step-length and turn-angle distribution (resulting in a traditional SSF), or 2) by fitting a parametric distribution to the observed step lengths and turn angles (which can result in an iSSF).As mentioned above, an iSSF is arguably less biased, and also provides the user with a mechanistic movement model that can be used to simulate space use, and hence utilization distributions (Avgar et al., 2016;Signer et al., 2017).For these reasons, amt only implements the iSSFs with parametric distributions.For further details we refer the reader to Duchesne et al. (2015) and Avgar et al. (2016).
Thus, we proceed by fitting a gamma distribution to the step lengths and a von Mises distribution to the turn angles using maximum likelihood (Agostinelli & Lund, 2017;Delignette-Muller & Dutang, 2015), and use these distributions to generate and pair 9 random steps with each observed step.Zero step lengths can cause estimation problems, so random steps automatically adds a random error between 0 and an upper limit that can be specified though the argument random error (with default= 0.001, alternatively the mimimum of observed step lengths would be a good choice).We then extract the covariates at the end point of each step (observed and random) using the function extract covariates, and fit a conditional logistic regression model to the resulting data including movement-related covariates with the function fit issf (a wrapper to survival::clogit; Terry M. Therneau & Patricia M. Grambsch, 2000).
We included two main effects in the model, the environmental covariate wet, and the log of the step length (log sl ) as a modifier of the shape parameter of the underlying gamma distribution.We also include interactions between wet and tod , a factor with two levels -day (the reference category) and night, and between tod and log sl .These interactions are included to the test the hypotheses that habitat selection and displacement rate, respectively, differ between day and night.m1 <-stps %>% random_steps(n = 9) %>% extract_covariates(wet) %>% time_of_day(include.crepuscule = FALSE) %>% mutate(log_sl_ = log(sl_)) %>% fit_issf(case_ ~wet + log_sl_ + wet:tod_end_ + log_sl_:tod_end_ + strata(step_id_)) We could have also included cosines of the turning angles and their interaction with day.This choice would modify the concentration parameter of the underlying von Mises distribution for the turning angles and allow the degree of directional persistence to depend on time of day; the data summarized in Fig. 1 suggests that this could be a sensible choice.For the sake of simplicity, however, we have assumed we have correctly modeled the degree of directional persistence and that it does not differ between day and night.
Inspecting the fitted model (Table 1), we make the following observations.1) there is evidence to suggest that the animal prefers forested wetlands over other landuse classes, 2) there is no difference in habitat preference between day and night, 3) there is evidence to modify the shape of the underlying gamma distribution (through the log of the step length), and 4) the modification of the shape parameter should be done separately for day and night.
Besides inspecting the coefficients and their standard errors, we can calculate derived quantities, such as the expected speed.Because we included an interaction between parameters of the step length distribution and time of the day, we have to account for this interaction when calculating the expected speed for day and night.We begin by retrieving the tentative parameter estimates for the gamma distribution of the step length distribution: shape <-sl_shape(m1) scale <-sl_scale(m1) And adjust the shape for day and night with the estimates of the corresponding coefficients from the fitted model (Avgar et al., 2016).
shape_adj_day <-adjust_shape(shape, coef(m1)["log_sl_"]) shape_adj_night <-adjust_shape(shape, coef(m1)["log_sl_"]) + coef(m1)["log_sl_:tod_end_night"] The underlying gamma distributions for the step lengths vary by time of day (Table 1).The expected speed for day and night is then given by the product of the tentative scale parameter (no adjustment is needed here, because we did not include step length in the model) and the adjusted shape parameter.To obtain 95% confidence intervals for the mean speed, we bootstrapped the model m1 1000 times by resampling (with replacement) the strata (for full code see Supplement S1).Results suggest that Ricky T moves significantly faster during nights (11.0 m/min, 95% CI = 10.7,11.4 m/min) than during days (8.57m/min, 95% CI = 7.8, 9.32 m/min).
In a final step, we simulated space-use from the fitted model m1 to obtain a modelbased estimate of the animal's utilization distribution (UD; Avgar et al., 2016;Signer et al., 2017).Generally, two types of UDs can be simulated: the transient UD and the steady state UD.The transient UD describes the expected space-use distribution of the animal within a short time period, and is hence conditional on the starting position.The steady state UD describes the expected space-use distribution of the animal in the long-term.In order to simulate UDs one has to ensure that the animals stay within the study domain.We see three possible methods for achieving this goal: 1) use a covariate that attracts the animal towards one or more centers of activity (e.g., the squared distance to the mean of all coordinates), 2) use a very large landscape, or 3) use a wrapped landscape (torus).Here, we illustrate the simulation of steady state and transient UDs.For the steady state UD we simulate from the first observed location 10 7 time steps on a toroid landscape, once for day and once for night.For the transient UD, we are interested in the UD up to 10 hours after last observation, we therefore simulated 72 steps (at a 10 min sampling rate) 5 × 10 3 times.
We describe the simulation for the steady state and transient UD for daytime.First we create a movement kernel (Fig. 2A), that is used to determine the animal's movement ability at each time step.Note, we use the tentative scale estimate and the shape estimate adjusted for day.Next, we create a habitat kernel (that is for each pixel we calculate the estimated selection coefficients times the resources and exponentiate the product; Fig. 2B).
hk <-habitat_kernel(coef = list(forest = coef(m1)["wet"]), resources = wet) We then estimate the steady state UD (Fig. 2CE) with the function simulate ud: In order to simulate the transient UD (Fig. 2CE), we have to repeatedly simulate short tracks starting at the same point, and then sum individual UDs and normalize, which we do with a simple for-loop.All simulations took < 1 minutes on a standard laptop.

Many animals: quantifying population-level effects
We start again with the same data set (dat), containing data from 6 individual fishers.This time we are interested in quantifying among-animal variability in the selection coefficients.We proceed using nearly all the same steps as in the first example, but with a different data structure: data frames with list-columns (Müller & Wickham, 2018).List columns are best thought of as regular columns of a data frame that are R lists and can contain any objects (in our case tracks and fitted models).The purrr::nest command can be used to nest data into a list-column (Henry & Wickham, 2017).
dat_all <-dat %>% nest (-id) dat all is now a data frame with 6 rows (one for each individual) and two columns.In the first column the animal id is given, and in the second column (by default named data) the relocations of the corresponding animal are saved.We start by assigning the sex of each animal.
dat_all$sex <-c("f", "f", "f", "m", "m", "m") We can now apply the steps as before for all animals.We first create a track for each animal and transform the coordinate reference system.Next, we prepare again the landuse data.This time we reclassify the landuse raster (using raster::reclassify) into five categories: water and wetland forests, developed open spaces, other developed areas, forests and shrubs, and crops.land_use <-raster("data/landuse_study_area.tif")rcl <-cbind(c(11, 12, 21:24, 31, 41:43, 51:52, 71:74, 81:82, 90, 95), c(1, 1, 2, 3, 3, 3, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 1, 1)) lu <-reclassify(land_use, rcl) names(lu) <-"landuse" We again first inspect the sampling rate of the 6 individuals: This time some individuals have a 2 min sample rate and others a 10 min one.Thus we decided to resample the tracks to the same sampling rate of 10 minutes (noting that (i)SSF inference is scale dependent; Signer et al., 2017) using track resample.We then filter again bursts, keeping only those with at least three points (filter min n burst), convert from a point to a step representation of the tracks (steps by burst), generate 9 random steps for each observed step (random steps), extract the environmental covariates (extract covariates), convert landuse to a factor (mutate) and fit a SSF (fit issf).The main difference to the previous example here, is that the all the steps from above are wrapped into one mutate call.This call creates a new column to dat all called ssf.This is a list column and each entry in this columns contains a fitted SSF.x %>% track_resample(rate = minutes(10), tolerance = minutes(2)) %>% filter_min_n_burst(min_n = 3) %>% steps_by_burst() %>% random_steps() %>% extract_covariates(lu) %>% mutate(landuse_end = factor(landuse)) %>% fit_issf(case_ ~landuse_end + strata(step_id_)) })) m1 is still a data frame with one new column: ssf, that is again a list column with a fitted SSF.From here, it is easy to investigate coefficients for several animals and look at population-level effects.The results suggest that there are some general population-level trends (Fig. 3).All fisher seem to prefer wetland forests and natural areas relative to developed areas (of either type), whereas considerable among-animal variability in the coefficients for crops makes it difficult to draw firm conclusions about this landuse type.Lastly, there seems to be little differentiation based on sex.(Fig. 3, code provided in Supplement S1).

Discussion and Outlook
We have illustrated how amt can be used to fit Step-Selection Functions (SSFs) and explore temporal movement and habitat selection patterns at the individual and population levels.We demonstrated how an iSSF, fit to a single fisher, can be used to simulate utilization distributions (UDs Signer et al., 2017).The UD map (Fig. 2) can be thought as of a stochastic approximation of a mechanistic home-range model (Moorcroft & Lewis, 2013).Whereas traditional home-range estimators offer static summaries of space-use patterns, mechanistic home-range estimators can provide insights into the movement and habitat selection processes that give rise to these patterns.In our model, we included an interaction between parameters of the movement model and time of the day (day/night), allowing us to explore time-dependent spaceuse patterns (Fig. 2B,D).We then showed how amt can be used to conduct similar analyses with more than one animal, allowing us to investigate population-level effects by looking at the distribution of animal-specific coefficients (Fig. 3 Fieberg et al., 2010).In our example we restricted the analysis to habitat selection, incorporation of a movement model would be straight forward here as well (Prokopenko et al., 2017a;Scrafford et al., 2017).
We expect amt will contribute to movement ecology in two ways.First, amt is likely to help researchers manage their data and analyses using a more reproducible workflow, a much discussed issue (e.g., Lewis et al., 2018;Cooper & Hsing, 2017).Second, amt will facilitate the use of iSSFs by a wider community of ecologists and also allow them to more fully realize the power of these methods (e.g., by modeling how landscape features influence both movement and habitat selection processes).Prior to amt, software for implementing iSSFs was not available.Therefore, use of iSSFs required custom-written code.amt provides functions that make it easy to fit iSSFs and to explore predicted space-use patterns from fitted models.
Besides the introduced functions to fit SSFs, amt provides additional functions for calculating home ranges, estimating RSFs and other utility functions to work with telemetry data and interface with other packages.Future development of amt will focus on increased functionality by adding more functions for data quality assurance.

Figure 3 :
Figure3: Point estimates with 95% confidence intervals for the relative selection strength(Avgar et al., 2017) for different landuse classes (we used wetland forests and wet areas as the reference class).Different colors indicate the id of the animals and symbols the sex (circles for female and triangles for males).Population-level estimates are given by solid vertical lines and 95% confidence intervals at population level is given by the light gray boxes.The dashed horizontal line indicates no preference relative to wetland forest (the reference category).