Search and foraging behaviors from movement data: A comparison of methods

Abstract Search behavior is often used as a proxy for foraging effort within studies of animal movement, despite it being only one part of the foraging process, which also includes prey capture. While methods for validating prey capture exist, many studies rely solely on behavioral annotation of animal movement data to identify search and infer prey capture attempts. However, the degree to which search correlates with prey capture is largely untested. This study applied seven behavioral annotation methods to identify search behavior from GPS tracks of northern gannets (Morus bassanus), and compared outputs to the occurrence of dives recorded by simultaneously deployed time–depth recorders. We tested how behavioral annotation methods vary in their ability to identify search behavior leading to dive events. There was considerable variation in the number of dives occurring within search areas across methods. Hidden Markov models proved to be the most successful, with 81% of all dives occurring within areas identified as search. k‐Means clustering and first passage time had the highest rates of dives occurring outside identified search behavior. First passage time and hidden Markov models had the lowest rates of false positives, identifying fewer search areas with no dives. All behavioral annotation methods had advantages and drawbacks in terms of the complexity of analysis and ability to reflect prey capture events while minimizing the number of false positives and false negatives. We used these results, with consideration of analytical difficulty, to provide advice on the most appropriate methods for use where prey capture behavior is not available. This study highlights a need to critically assess and carefully choose a behavioral annotation method suitable for the research question being addressed, or resulting species management frameworks established.


| INTRODUCTION
Movement is major part of a species' ecology. The underlying processes driving the movement of individuals and populations are studied widely; however, it is often unfeasible to directly observe animals through constant effort. As a result, movement studies have focussed on remote detection of animals through technologies such as GPS and satellite tracking. The development, miniaturization, and reduction of cost in remote tracking technologies have enabled its widespread use in ecological studies (Cagnacci, Boitani, Powell, & Boyce, 2010).
Remote tracking enables behaviors to be inferred from an animals' trajectory (Buchin, Driemel, Kreveld, & Sacristán, 2010), and has led to rapid advances in the understanding of species' ecology (Nathan et al., 2008).
While movement patterns are often used to distinguish active phases from rest, or search behavior from traveling (van Beest & Milner, 2013;Dzialak, Olson, Webb, Harju, & Winstead, 2015), identifying these behavioral states typically relies on more complicated modeling procedures to detect potential underlying mechanisms within behavior identification (Jonsen, Myers, & James, 2006;Kerk et al., 2015).
Considerable progress has been made in developing methods that can categorize behaviors based on simple movement metrics (Edelhoff, Signer, & Balkenhol, 2016). These methods commonly identify multiple states and ascribe these to predefined behaviors such as search, rest, or travel (Evans, Dall, Bolton, Owen, & Votier, 2015;Guilford et al., 2008;Hamer, Phillips, Wanless, Harris, & Wood, 2000;King, Glahn, & Andrews, 1995;Palmer & Woinarski, 1999;Shepard, Ross, & Portugal, 2016;Weimerskirch et al., 2006). However, Gurarie et al. (2016) argued for closer and more detailed exploratory analysis of movement data to prevent mis-specification of behavior, suggesting that the strengths of particular methods need to be more carefully considered so they are suitably attuned to the specific questions being asked by researchers.
Within conservation management, there is an increasing reliance on identifying space use by species of conservation concern (Allen & Singh, 2016). For example, within the marine environment, foraging areas could be considered for the protection and management of seabird species (Lascelles et al., 2016). The use of these approaches may contribute to the establishment of conservation measures including designation of marine protected areas (Grüss, Kaplan, Guénette, Roberts, & Botsford, 2011). Foraging activity is a key component in an animal's time and energy budget, and it is well established that animals in environments with patchy resources must engage in search behavior to optimize their foraging effort in terms of maximizing prey encounters (MacArthur & Pianka, 1966). Therefore, foraging can be considered a two-part system, containing both search and prey capture attempts (Charnov, 1976). Understanding the interaction between search and prey capture is a key component in optimal foraging theory (Pyke, 1984). For example, while there has been much work identifying area-restricted search (Knell & Codling, 2012), there is little information on the relationship between search and prey capture. Validation of search behavior is difficult particularly in animals where direct observation is challenging, such as those in many biotelemetry studies.
Many movement studies use path segmentation techniques to detect search behavior; however, many of these are unvalidated estimates of search due to the lack of a second data stream for ground-truthing.
Validation of prey capture attempts has been achieved using animalborne cameras (Bicknell, Godley, Sheehan, Votier, & Witt, 2016;Moll, Millspaugh, Beringer, Sartwell, & He, 2007), time-depth recorders (Dean et al., 2012;Shoji et al., 2015;Tinker, Costa, Estes, & Wieringa, 2007), stomach loggers (Weimerskirch, Gault, & Cherel, 2005), and accelerometers (Hansen, Lascelles, Keene, Adams, & Thomson, 2007;Sato et al., 2007) among others. However, many of these technologies are either expensive resulting in small sample sizes or are too large to deploy on animals in combination with location loggers without significant adverse impacts (Barron, Brawn, & Weatherhead, 2010;Hammerschlag, Gallagher, & Lazarre, 2011;Vandenabeele, Shepard, Grogan, & Wilson, 2012). As a result, many studies still rely on the sole use of location data and path segmentation approaches to identify behavior. The determination of behavior from movement data is an active area of research and the subject of many reviews (Allen, Metaxas, & Snelgrove, 2017;Edelhoff et al., 2016;Hays et al., 2016;Jacoby, Brooks, Croft, & Sims, 2012). There are several different methods for undertaking behavioral annotation or detecting important areas of high use by animals. Frequently used are movement pattern description and process identification. Methods based around movement pattern description are often aimed at trying to split between different behavioral periods or to locate changes in behavior (Edelhoff et al., 2016). Process identification aims to take things a step further and concentrates on methods that are focussed toward being able to describe the underlying processes, whether extrinsic or intrinsic, and describe how these inform behavior.
Northern gannets (Morus bassanus), hereafter gannets, are a wellstudied species that occur principally in the temperate shelf seas of the North Atlantic during the breeding season. Gannets are visual predators (Cronin, 2012) and undertake plunge-diving from height, entering the water at speeds of up to 24 m/s (Chang et al., 2016).
Prior to diving, gannets typically slow their flight and increase their path sinuosity Bodey et al., 2014;Patrick et al., 2014;Warwick-Evans et al., 2015). The relationship between slow speed during search and prey capture attempts has been established both theoretically (Bartoń & Hovestadt, 2013;Benhamou, 2004) and empirically in a variety of mobile marine and terrestrial species (Anderson & Lindzey, 2003;Byrne & Chamberlain, 2012;Edwards, Quinn, Wakefield, Miller, & Thompson, 2013;McCarthy, Heppell, Royer, Freitas, & Dellinger, 2010;Towner et al., 2016;Wakefield et al., 2013;Williams et al., 2014). Such changes in movement and clearly identifiable prey capture attempts in the form of dives (Cleasby et al., 2015;Garthe, Benvenuti, & Montevecchi, 2000), as well as their ability to carry multiple devices and ease of recapture, make gannets a suitable model species to explore the ability of movement-based analysis to identify search behavior and prey capture attempts.
In this study, we apply and compare seven methodologies covering movement pattern description and process identification, to predict search behavior in gannets using GPS location data. If search behavior is a precursor to prey capture attempts, dives will occur primarily within areas identified as search. With consideration given to opportunistic foraging, we hypothesize that more successful methods of search classification will contain more true positives (dive events occurring within identified search), fewer false positives (search containing no dives), and fewer false negatives (dives occurring outside identified search behavior). Using this framework, we will also provide recommendations on the appropriate use of methodological approaches.

| Data collection
Breeding adults at two island colonies, Great Saltee, Co. Wexford, Ireland (52.11286, −6.62189) and Bass Rock, Scotland, UK (56.07672, −2.64139), were tracked while attending 2 to 7-week-old chicks over a 38-day period from late June to early August 2011. Nine birds at Great Saltee and eight birds at Bass Rock were caught using a metal crook or wire noose fitted to a 4 to 6-m pole and fitted with GPS loggers coupled with time-depth recorders (TDRs). GPS loggers (i-gotU GT-200, Mobile Action Technology Inc., Taipei, Taiwan, 37 g), sealed in heatshrink plastic, recorded locations every 2 min. CEFAS G5 TDRs (CEFAS Technology, Lowestoft, UK, 2.5 g) were deployed using the fast-log dive sensor at 4 Hz and used to identify dive events based on a 1 m depth threshold being exceeded, hereafter TDR dives. This was to ensure dives reflected prey capture attempts (median dive depth of 4.6 m in plunge-diving gannets and 8 m when pursuit diving (Garthe et al., 2000) rather than other surface-related activities such as resting, washing, or preening. Devices were attached following (Grémillet et al., 2004), and involved affixing loggers ventrally to 2-4 central tail feathers using strips of waterproof Tesa© tape.
Total instrument mass was ≤2% of body mass, below the maximum recommended for seabird biologging studies (Phillips, Xavier, & Croxall, 2003), and tag position was considered to minimally impede gannets aerodynamically or hydrodynamically (Vandenabeele et al., 2012). Deployment and retrieval handling times were approximately 10 min.

| Data processing
GPS tracks were processed using the AdehabitatLT package (Calenge, 2011) in the R statistical Framework. Location data were transformed into Cartesian coordinates using a Universal Transverse Mercator (UTM) 30N projection before calculating step length and turning angles. Although GPS tags were programmed to take locations every 2 min, if there was no available GPS signal (because a bird was diving for example), locations may not have been exactly two minutes apart, and so tracks were standardized through linear interpolation to a two-minute interval. Speed, step length, turning angle, and distance from colony were calculated for every point along a bird's track. Points within 5 km of the colony were removed to avoid potential locations associated with colony rafting and bathing (Carter et al., 2016), as were those occurring at night (between civil sunset and sunrise) because gannets are visual diurnal foragers (Nelson, 2002). TDR dives were split into dive events and produced a single timestamp point representing the start of any dive event over 1 m for appending to tracks following behavioral classification.
We applied a suite of methods commonly used to identify searching or infer foraging behaviors from movement data, summarized in Table 1. The methods are not considered exhaustive, but represent a range of approaches covering movement pattern description and process identification (Edelhoff et al., 2016). Movement pattern description approaches include kernel density, first passage time (FPT), and speed/tortuosity thresholds, while process identification techniques applied covered k-means clustering and two state-space models, hidden Markov models (HMM) and effective maximization binary clustering (EMbC). The two forms of state-space models were used to represent diverging classes of state-space model; maximum likelihood methods (EMbC), and Bayesian Monte Carlo methods (HMM) (Patterson, Thomas, Wilcox, Ovaskainen, & Matthiopoulos, 2008).
While not predicting/identifying search behavior directly, we also applied machine learning (generalized boosted regression models) to predict dives from track metrics rather than search behavior. We followed the standard methodology for each technique outlined in the T A B L E 1 Summary of common methodological approaches to identifying search and foraging behavior in movement data. While all methods require validation data to assess how well the method works, it is not necessarily required to implement the method published literature, and provide references for detailed guidance on applying each approach.
Methods of predicting search behavior routinely identify chains of search in successive locations. Chains can be a single point in length or may include multiple consecutive points along a movement track (see Figure 1). Given that in gannets, individual prey capture attempts (dives) occur at discrete locations/times, we extracted metrics of dives occurring within search, dives outside of search, and search containing no dive. Data from the two colonies were processed independently to account for potential differences in movement metrics associated with differences in local habitat and prey availability.

| Kernel density
Time in space is considered to be a good proxy for foraging effort (Warwick-Evans et al., 2015). GPS locations (excluding locations within 5 km of the colony and locations at night) were used to estimate kernel densities in ArcMap 10.2, which uses a kernel smoothing function based on the quartic kernel function by Silverman (1986), and had a bandwidth/search distance of 10 km. This was used to create a kernel density square grid with sides of 10 km. The method produces a 10 km 2 grid with relative intensity of both TDR dives and GPS tracks.
Dutilleul's modified spatial t test (Dutilleul, Clifford, Richardson, & Hemon, 1993) was used to determine the spatial correlation between the intensity of dives and intensity of tracks, accounting for spatial autocorrelation in the data. passage time values was then used to determine appropriate search radii for each individual bird. The slowest sextile of passage times was considered to be relatively higher intensity search behavior as used by Nordstrom, Battaile, Cotte, and Trites (2013), and also indicated in work by Hamer et al. (2009) following Fauchald and Tveraa (2003).

| First passage time
Search radii were used to create an amalgamated area of search along an individual bird's track, with GPS points along this track treated as "search" points. Although FPT can be used to determine nested levels of area-restricted search (Hamer et al., 2009), we have considered only the highest levels of search behavior to maximize the number of dives potentially occurring within search.

| k-Means clustering
k-Means clustering is a method of vector quantization that aims to partition n observations into k clusters, and has been used to cluster data points consistent with different behaviors (Jain, 2010). k- variance (Ketchen & Shook, 1996). This resulted in three clusters, and these were then assigned behavioral states based on logical differences between the means of variables in each group.

| Speed-tortuosity thresholds
Speed-tortuosity thresholds from Wakefield et al. (2013) were applied to the data. These were developed based on prior knowledge of gannet foraging behavior and an iterative examination of plausible thresholds of movement indices from those initially suggested by Grémillet et al. (2004). Thresholds suggested by Wakefield et al. (2013) were applied as they were based on data from tracked gannets from a range of colonies, including the data analyzed in this study.
Successive GPS locations were considered to represent search if they met any one of three conditions:

| Hidden Markov Models
Hidden Markov Models (HMM) are an example of state-space modeling, where models are formed of two parts, an observable series and a nonobservable state sequence (Langrock et al., 2012). The observable series, in this context, take the form of GPS relocations with consequential step length and turning angle, while the nonobservable are behavioral states. HMM use a time series to determine what denotes the underlying states and the changes between them. The application of state-switching models to movement data allows behavioral modes to be examined, while considering the high degree of autocorrelation present in telemetry data (Patterson et al., 2008). When the observational error is low, hidden Markov models offer a more trac-

| Expectation-maximization binary clustering
Expectation-maximization binary clustering (EMbC) protocols are an unsupervised, multivariate example of a state-space modeling framework that can be used for behavioral annotation of movement trajectories, including search behavior (see Garriga, Palmer, Oltra, and Bartumeus (2016)). EMbC has been designed to be a simple method of analyzing movement data based on the geometry alone, and can behaviorally annotate movement data with minimal supervision.
EMbC is a relatively modern technique that is gaining traction within movement ecology. It has previously been used in a variety of movement studies, including exploring behavioral differences between distinct populations of the red-footed booby (Mendez et al., 2017) and coupling energy budgets with behavioral patterns under an optimal foraging framework (Louzao, Wiegand, Bartumeus, & Weimerskirch, 2014). Analysis was undertaken using the EMbC package in R (Garriga & Bartumeus, 2015), using calculated velocities and turning angles to infer behavioral classifications.

| Machine learning
While the methods outlined above all identify search behavior, machine learning models are trained to specifically identify prey capture/ dive events based on track metrics. Analysis was undertaken using the Caret package in R (Kuhn, 2008) using generalized boosted regression models to account for zero-inflation (Elith, Leathwick, & Hastie, 2008). Models were built using step length, speed, turning angle, hour of day, and tortuosity. Models were trained using 75% of the linked GPS/TDR dive data, with the remaining 25% of data kept for validation of predictions, and underwent cross-validation 500 times during the training procedure. By combining all individual animal's data in this manner, we ensure that any intra-individual variation is accounted for in the modeling process. Receiver operator curves (ROCs) were calculated (Fielding & Bell, 1997) to determine the model of best fit at each colony.

| Comparison of methods using TDR dives
In order to compare the predictive power of the seven methods out- The correlation between kernel densities of GPS tracks and TDR dives was assessed using a Dutilleul's modified spatial t test (Dutilleul et al., 1993). This analysis provides a correlation coefficient across the spatial extent of the tracked data to determine how well the two datasets correlate while accounting for spatial autocorrelation. Model performance for machine learning was assessed using kappa values, a measure of variability explained by the model akin to R 2 values, where 0 is equal to no relationship and 1 is equal to a perfect relationship as per Landis and Koch (1977). Further to this, a confusion matrix was calculated by running models on the remaining 25% test data to assess the number of correctly and incorrectly identified dives.  (Table 3).

| RESULTS
The performance of behavioral classification methods was assessed by comparing the occurrence of TDR dives inside and outside of predicted search behavior (Table 2)   Machine learning and kernel density assessed with other metrics due to nature of analysis, see Tables 3-5.   b Nighttime and locations close to the colony have been omitted. The remaining proportion of relocations is considered to be a combination of rest and travel.

| DISCUSSION
does not always result in prey capture attempts. These findings suggests that significant effort is spent in unsuccessful search behavior, consistent with low prey encounter rates associated with foraging on spatially and temporally patchy prey resources (Weimerskirch, 2007).
While the spatial distribution of tracked gannets will encompass a variety of behaviors including foraging, travel, and rest periods, simpler methodologies such as kernel density estimation of track data correlated well with kernel densities of TDR dive events. This supports the assertion that time in area is a good proxy for foraging effort (Grémillet et al. 2004;Warwick-Evans et al., 2015). However, this approach utilizes larger areas of space beyond movement paths, and so it is not capable of identifying foraging in association with temporally ephemeral events or features that may directly change an animal's movement trajectory. Within more process-driven approaches, FPT is arguably one of the most ubiquitous methods used to identify foraging areas in both terrestrial and marine systems (Battaile, Nordstrom, Liebsch, & Trites, 2015;Byrne & Chamberlain, 2012;Evans et al., 2015;Hamer et al., 2009;Le Corre, Dussault, & Côté, 2014). FPT captures search behavior across multiple spatial scales and is particularly noted for its ability to detect nested scales of area-restricted search (Hamer et al., 2009). While we did not investigate nested scales of search, FPT, along with k-means clustering, had the lowest rate of dives occurring within broad areas of identified search. However, in contrast to k-means, Speed-tortuosity thresholds "captured" 68% of TDR dives within areas identified as search. There is evidence to suggest that humans are more capable than machines at pattern recognition when presented with limited data (Samal & Lyengar, 1992). It is therefore unsurprising that thresholds performed well considering that they were constructed based on prior knowledge of foraging behavior and iterative examination of thresholds against a validation dataset in gannets . The relatively high rates of false positives (66% of search chains containing no TDR dive) were within the spread of values for other methods, highlighting significant effort spent searching for prey interspersed with relatively few prey encounters.
The state-space modeling framework has been acknowledged as particularly useful in movement ecology (Patterson et al., 2008), and is rapidly expanding within path segmentation techniques (Michelot et al., 2016;Roberts & Rosenthal, 2004). Both the EMbC and HMM approaches model the changes in step length and turning angle through time and space to annotate the trajectory of an animal with behavioral states Michelot et al., 2016) it also had one of the lowest rates of false positives. Less than 20% of dives occurred outside of search. This would be more consistent with opportunistic foraging and provides further empirical evidence of search behavior leading to prey capture attempts (Dias, Granadeiro, & Palmeirim, 2009;Weimerskirch, Pinaud, Pawlowski, & Bost, 2007).
The high number of shorter search chains identified by EMbC, coupled with the fact that it is possible to link state transitions to environmental covariates in a HMM framework, suggests that both these methods may also be suitable for or investigating behavioral response to ephemeral environmental cues.
Regional differences in habitat and prey, as well as inter-and intraspecific competition are likely to influence the way an animal forages (Huig, Buijs, & Kleyheeg, 2016;Schultz, 1983;Zach & Falls, 1979). To account for this, the colonies were treated independently during analysis. Machine learning did highlight slight differences between colonies in the movement metrics considered to be of most predictive power, suggesting local differences in movement associated with foraging and search. Machine learning was the only method that directly predicted prey capture events rather than search behavior. While the explanatory power of the models was deemed to be satisfactory, the predictive ability of models was poor, searching (Bartoń & Hovestadt, 2013;Benhamou, 2004). However, slowing down and turning more could also be an indication of rest behavior, especially when considering potential error from closely positioned GPS relocations (Hurford, 2009;Jerde & Visscher, 2005).
The ability to exclude a period that closely resembles search patterns could have the potential to reduce false-positive periods of search, and we accounted for this as much as possible by removing locations in proximity to the colony as well as locations occurring at night before comparing methods. While not directly assigning a rest period, it is F I G U R E 2 Proportion of (a) TDR dives occurring within 'search" behavior (true positives) and (b)  important to note that speed-tortuosity thresholds could be adapted to include the annotation of rest and travel, as well as specific search behavior. In a similar fashion, machine learning protocols could also be applied to predict behaviors other than diving.
Careful choices must be made in the selection and application of behavioral classification methods when inferring foraging. While all methods tested generally supported the hypothesis that search behavior leads to prey encounter and subsequent prey capture attempts in a wide-ranging pelagic predator, there was considerable variation in the degree to which this was noted. The HMM method produced estimates of foraging behavior that most effectively encapsulated both search and prey capture components of foraging.
As such, it would seem a sensible recommendation that HMM be used when identifying foraging (including both search and prey capture) areas is a priority. Across methods, rates of false negatives (dives occurring outside of search behavior) ranged from 19% to 70%. While some of this may be attributed to opportunistic feeding outside of search behavior, methods with high rates of false negatives suggest that care should be taken when using behavioral classification methods. That animals spend considerable time actively searching for prey, while prey capture occurs largely outside of this activity seems improbable, and poor classification of behaviors can have implications when considering time-energy budgets and subsequent reproductive success or survival. Methods such as HMM, EMbC, and thresholds had the lowest rates of dives occurring outside of search. These methods may be more attuned to capturing dive events and therefore represent a more inclusive definition of foraging, while FPT and k-means clustering may be more general in their identification of search. Investigating the differences between methods may lead to increased understanding of the environmental cues used by predators to initiate search and prey capture as well as the scales at which these cues occur. Nevertheless, we reiterate the need for detailed exploratory analysis of movement data to prevent mis-specification of behavior (Gurarie et al. (2016)) and argue for methods to be used based on suitability, and the questions being asked by researchers.

ACKNOWLEDGMENTS
We would like to thank all those involved in fieldwork as well as land-

CONFLICT OF INTEREST
None declared.

AUTHOR CONTRIBUTIONS
AB, MJ, and SB conceived the initial ideas and designed methodology.
WJG undertook analysis for HMM, AB undertook remaining analysis. MJ, WJG, EW, TWB, SCV, and KH provided advice and guidance on analytical frameworks and manuscript preparation. AB and MJ led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

DATA ACCESSIBILITY
Data reported in this article are archived by Birdlife International (www.seabirdtracking.org).