Reconstructing bird trajectories from pressure and wind data using a highly optimized hidden Markov model

Tracking technologies have widely expanded our understanding of bird migration routes, destinations and underlying strategies. However, determining the entire trajectory of small birds equipped with lightweight geolocators remains a challenge. We develop a highly optimized hidden Markov model (HMM) for reconstructing bird trajectories. The observation model is defined by pressure and, optionally, light measurements, while the movement model incorporates wind data to constrain consecutive positions based on realistic airspeeds. To reduce the computational costs associated with a large state space, we prune the HMM states and transitions based on flight and observation constraints to efficiently model the entire trajectory. The approach presented is based on a mathematically exact procedure and is fast to compute. We demonstrate how to compute (1) the most likely trajectory, (2) the marginal probability map of each stationary period, (3) simulated trajectories and (4) the wind conditions (wind support/drift) encountered by the bird during each migratory flight. We construct a version of an HMM optimized for reconstructing a bird's migration trajectory based on lightweight geolocator data. To render this approach easily accessible to researchers, we designed a dedicated R package GeoPressureR (https://raphaelnussbaumer.com/GeoPressureR/).


| INTRODUC TI ON
Tracking technologies constitute an essential tool to better understand animal migratory movements and behaviour (Bridge et al., 2011;Kays et al., 2015). Indeed, by providing frequent information about an animal's position, they allow researchers to study the factors influencing the migratory route used (e.g. Briedis, Bauer, et al., 2020;Thorup et al., 2017). In addition, tracking a single individual during its full journey can reveal space-time interactions as well as cumulative effects over time (e.g. McKinnon et al., 2015).
Tagging small animals (<50 g) comes with strict requirements on the weight of the tracking device, limiting the type of technology used to lightweight devices. These devices are particularly suited to smaller bird species but can also be used more broadly on larger species in cases where satellite transmitters cannot be set.
The two main technologies currently allowing to precisely position small birds include archival GPS (e.g. Hallworth & Marra, 2015) and automatic telemetry (e.g. Taylor et al., 2017). However, these only provide a limited number of position estimates. Beyond these two technologies, lightweight geolocators can provide regular positions with larger uncertainty by measuring environmental variables for which the spatiotemporal variation is known. Among these, lightlevel geolocators are the oldest and most widespread technology (Wilson et al., 1992), along with devices measuring sea-surface temperature used mostly for marine wildlife (e.g. Nielsen et al., 2006).
More recently, location estimates based on atmospheric pressure sensors have been shown to provide positions with higher precision (Nussbaumer et al., 2023). This offers promising research avenues as pressure information becomes increasingly available with the use of multi-sensor geolocators Dhanjal-Adams et al., 2018;Liechti et al., 2018;Sjöberg et al., 2018). Yet, both light-and pressure-based location estimates suffer from large uncertainty, particularly during the short stopovers that constitute critical steps in the migration trajectory.
Trajectory estimation from imperfect position data is a wellknown problem in the animal movement literature and is typically modelled as a state-space model (SSM; e.g. Jonsen et al., 2013;Patterson et al., 2008), where the animal's position X t is defined as a Markov model called a process or movement model P X t | X t−1 . The model considers that X is unknown but is related to an observed variable Y (e.g. light, pressure, or temperature measurements) through an observation model P Y t | X t . SSMs for light-level geolocators and sea-surface temperature have been mostly developed in marine biology and solved with a Kalman filter (Nielsen et al., 2006;Sibert et al., 2003) or an unscented Kalman filter (Lam et al., 2008;Nielsen & Sibert, 2007). However, when the SSM is strongly nonlinear (e.g. with pressure geopositioning), a Kalman filter cannot be used and more computationally demanding Bayesian methods such as Markov Chain Monte Carlo (MCMC; Jonsen et al., 2005;Sumner et al., 2009) or particle filtering (Rakhimberdiev et al., 2015;Royer et al., 2005) must be employed. Such methods have been used successfully for reconstructing the trajectory of small birds using lightweight geolocation (e.g. Lisovski et al., 2020). However, their implementation can be challenging due to the complexity of choosing the type of Monte Carlo method, configuring the run (e.g. step size) and checking convergence. In this paper, we consider only the problem of estimating bird trajectories given the observations under a fixed process model. For the related problem of performing inference jointly over trajectories and unknown parameters of a process model, efficient routines for trajectory estimation can be nested within a larger inference procedure, for example, using MCMC.
Whenever the SSM can be discretized in space and time (e.g. Pedersen et al., 2011), it becomes a hidden Markov model (Rabiner, 1989). In such cases, the positions can be estimated exactly and efficiently with the forward-backward algorithm, also known as a two-pass recursive algorithm (Rabiner, 1989;Scott, 2002;Zucchini et al., 2017), and the mostly likely trajectory can be found with the Viterbi algorithm (Viterbi, 2006). These algorithms have been used in animal movement literature (Leos-Barajas & Michelot, 2018;McClintock et al., 2020;Patterson et al., 2016) but have rarely been applied to geolocator data (e.g. Bindoff et al., 2018). A major challenge in estimating the trajectory of a bird with this approach is the memory requirements to store the probabilities of every possible transition between pairs of locations using a data structure known as the trellis graph (e.g. Rabiner, 1989).
Using pressure data from the geolocator, the trajectory of the bird can be efficiently modelled as an hidden Markov model (HMM) because bird trajectories can be discretized in a limited number of periods (time) and grid cells (space). Unlike marine wildlife, most birds migrate by alternating active migratory flight with periods of residency (stationary periods), where their position can be assumed fixed at the precision level provided by light and pressure geopositioning (~10 km). In addition, identifying the exact duration of flights (using activity data or pressure) can inform the movement model of the SSM by constraining the distance between consecutive positions, assuming a distribution of ground speeds (e.g. Briedis, Beran, et al., 2020). In this context, wind data have recently been shown to considerably improve position estimates because of the strong influence of wind on a bird's ground speed (Werfeli et al., 2022).
In this study, we present a framework to reconstruct the trajectory of a bird based on pressure (and optionally light) data captured by geolocators. In this approach, we model the trajectory as a discrete SSM (or HMM) and optimize the trellis graph of the HMM, where nodes correspond to the position of the bird at a stationary period and edges represent transitions from one position to the next. We first explain how to efficiently build this trellis graph (Section 2.2) and then demonstrate how the graph can be used to This method is available as an R package, called GeoPressurer (https://rapha elnus sbaum er.com/GeoPr essur eR/), including a user manual, GeoPressureManual (https://rapha elnus sbaum er.com/GeoPr essur eManu al/) and an introductory code, GeoPressureTemplate (https://rapha elnus sbaum er.com/GeoPr essur eTemp late/).

| Data
To illustrate how this method is implemented, we use a geolocator dataset consisting of 16 tracks from nine species covering a variety of migration distances and speeds. The dataset and its basic processing are described in full detail in Nussbaumer et al. (2023) and briefly recalled below.
First, the accelerometer and pressure data of each track are used to identify (1) stationary periods where the bird is assumed to be at a fixed location (~1-10 km scale) and (2)  Windspeed is used to refine the likelihood of the transition between each stationary period (see Section 2.2.5 for more details).

| Building the trellis graph
We model the trajectory of a single bird with an HMM by discretizing the positions X 0 , … , X n of the bird in n + 1 stationary periods and relating these positions to the observations Y 0 , … , Y n of pressure (and optionally light) via the SSM The initial distribution P X 0 has little impact because the initial observation P Y 0 | X 0 strongly constrains inferences about X 0 , for example, by encoding the known location of initial capture and tagging.
As such, we assume P X 0 to have a uniform distribution over space.
The transition probabilities P X k | X k−1 will vary with time due to the

| Step 2: Pruning nodes with the movement model
We compute the maximum flight distance possible between each stationary period D(k, k + 1) using the flight duration estimated from pressure data (see Section 2.1) and a ground speed threshold of 150 km/h. This threshold allows for exceptionally high speeds as the 95th percentile of ground speed for small migratory birds is below 90 km/h (Liechti & Bruderer, 1995). Nodes that are too far from any other nodes from either the previous or the following stationary period are pruned.
For computational feasibility, it is critical to perform this pruning without doing a full pairwise comparison among all nodes.
More specifically, let m k (m k < n lat n lon ) be the number of active (i.e. (1)

| 1121
Methods in Ecology and Evoluঞon NUSSBAUMER et al.
non-pruned) nodes for stationary period k after Step 1. A naïve algorithm to execute the pruning of this step would require the order of m k m k+1 computational steps, for example, by checking if each active node in stationary period k is within the maximum flight distance of at least one active node in stationary period k + 1. However, we perform the same operation in time proportional only to m k (i.e. time F I G U R E 1 Illustration of the creation of the graph. (a) The schematic example considers a simple trajectory from an equipment site (A 0 ) to a retrieval site (K 3 ) in three flights and two stationary periods. Nodes of different stationary periods are overlayed on the same spatial grid (right panel) (but distinguishable by subscript and shape).
Step 1. We keep all nodes corresponding to possible positions according to the 99th percentile of the likelihood map. We do not illustrate the static probability map for the schematic example.
Step 2. We eliminate nodes that are not within reach (<150 km/h) of at least one node from the previous and following stationary period. B 1 is eliminated on the forward pass, G 2 and C 1 are eliminated on the backward pass.
Step 3. We construct all possible transitions (red and orange lines).
Step 4. After computing the average windspeed and airspeed for each transition, we prune edges requiring an airspeed >100 km/h (orange dotted line). The final graph is thus composed of the red nodes and edge. (b) An example of the application of the method using the same subset of stationary periods as in fig. 4 in Nussbaumer et al. (2023). linear in the currently active number of nodes), which enables a significant speedup. We do this by using a fast algorithm for distance transformations in binary images (i.e. 0/1-valued; e.g. Kolountzakis & Kutulakos, 1992;Maurer et al., 2003). A distance transformation computes the distance between every pixel and the closest pixel with a '1' value, which, in our setting, represents currently active nodes in stationary period k. The algorithm uses the Euclidian distance on the image coordinates (i.e. lat-lon). These distances in decimal degrees are converted into kilometres assuming a flat-earth surface while accounting for the variation of distance between me- Using the distance transformation from the active nodes in stationary period k, we prune the active nodes for stationary period k + 1 requiring a flight distance higher than the maximum flight distance D(k, k + 1). This operation is repeated for all pairs of stationary periods going forward and then backward to ensure that all nodes are connected to the first and last nodes (equipment and retrieval).
This efficient pruning relies on the specific structure of the HMM, in which the states are located on a regular grid, allowing them to be treated as a binary image.

| Step 3: Computing edges with ground speed
Having identified and pruned the nodes, we now compute all edges between the remaining nodes. We compute the ground speed for each edge using the great circle distance and the duration of the flight.
Then, we filter the edges with the threshold of 150 km/h, which also removes some nodes as the flat-earth distance used in Step 2 was approximate. Finally, we prune the graph based on the constraint that all nodes must be connected to both the equipment and retrieval nodes using the breadth-first search algorithm (Cormen et al., 2022).

| Step 4: Computing edges with airspeed
We compute the average windspeed and airspeed of all edges and prune the graph based on a predefined airspeed threshold of 100 km/h, which considers potential uncertainty in wind estimates as the 95th percentile of airspeed for small bird migrants is below 75 km/h (Liechti & Bruderer, 1995).
To account for spatiotemporal change in windspeed at short temporal scales (especially with altitude), we take advantage of knowing, for each edge of the graph, (1) the exact time of departure and arrival (see Section 2.1), (2) the location of departure and arrival, and (3) the geolocator pressure measurements during the flight, which correspond to the bird's altitude.
We use the E/W and N/S windspeed components from the ERA5 pressure levels dataset (Hersbach et al., 2018a), which provide hourly wind estimates at a spatial resolution of 0.25° in latitude and longitude at 37 different pressure levels (i.e. altitudes). We then proceed to build two 4D gridded linear interpolations (E/W and N/S) that provide an estimate of windspeed for any time, latitude, longitude and pressure level.
For each edge of the graph, we estimate the average windspeed encountered by the bird. We construct the flight trajectory of the bird with discrete 4D positions (time-latitude-longitude-pressure level) as follows: • We discretise the trajectory on an hourly basis starting on the hour preceding flight departure and ending on the hour following arrival.
• The latitude and longitude positions are estimated assuming a linear displacement (i.e. "as the crow flies") between the departure and the arrival nodes.
• The pressure level positions are retrieved from the geolocator pressure measurements.
Windspeed significantly contributes to the displacement of the bird and can vary considerably within a flight. We, therefore, correct the horizontal position of the bird to assume a constant airspeed rather than a constant ground speed which would correspond to equally spaced hourly positions. We use an approximate two-step approach.
In the first step, we initialize the position assuming a constant ground speed G and query the corresponding hourly Finally, we prune the graph similarly to the previous step, this time based on an airspeed threshold of 100 km/h.

| Step 5: Computing the transition probabilities
In the final step, we compute the transition probability P X t | X t−1 (i.e. movement model) for each edge from the airspeed. We define a parametric function converting the average airspeed into a probability using flight energetics and considering that a bird is more likely to fly at an airspeed resulting in lower energy consumption. More specifically, we empirically define the probability proportional to the cubic inverse of the mechanical power of flight P mech of the airspeed v a , where v a x k−1 → x k is the airspeed computed in step 4 for the edge of the transition from the node x k−1 to the node x k . The mechanical power is computed with FlightMAT (Nussbaumer, 2022) following Pennycuick (2008), and taking into account each species' mass, wingspan and wing area from Tobias et al. (2022). We additionally impose a minimum apparent airspeed of v lim = 5 km/h, which allows for birds to perform short exploratory flights (e.g. Mills et al., 2011;Schmaljohann et al., 2011).

| Trajectory outcomes
Using the constructed graph, we derive the following three outcomes.

| Most likely trajectory
While the mean or median positions of each stationary period are commonly used to illustrate a single trajectory, they do not adequately represent the connectivity between stationary periods, even more so when the probability map is multi-modal. Instead, we suggest using the most likely trajectory, corresponding to the trajectory maximizing the joint probability, Within graph theory, the shortest path problem is a well-known computational task seeking to identify the path characterized by the smallest sum of weights of all its edges. The most likely trajectory is equivalent to the shortest path when the weights of the edges are defined as the negative log of the probability. Using this formulation, we compute the most likely trajectory using the shortest path search with Dijkstra's algorithm (Dijkstra, 1959). This operation produces the same result as the Viterbi algorithm for finding the single best state sequence in an HMM (Rabiner, 1989;Viterbi, 2006

| Marginal probability maps
We take advantage of the compact sparse graph to compute the marginal probability map of each stationary period, which is represented by the distribution P X t | Y 0 , … , Y n for each time step t. We use the forward-backward algorithm (e.g. Rabiner, 1989;Zucchini et al., 2017), which is a special case of the sum-product algorithm (e.g. Ross, 2019) tailored to HMMs.
We briefly explain how the forward-backward algorithm is applied in our case. For convenience, we group the transition probabilities P X t | X t−1 into a transition matrix T t (m × m), and all observation likelihoods P Y t = y t | X t = x t for all values of x t and for the observed can be computed as

| Random trajectories generator
To compute quantities relying on the entire trajectory such as the total distance flown or the average airspeed, it is most convenient to generate multiple possible trajectories and compute aggregated quantities on each one.
We use the forward filtering backward sampling (e.g. Carter & Kohn, 1994) to generate a possible trajectory x = x 0 , … , x n using the following steps: 1. Fix the known position of retrieval x n , or if unknown, sample from n .
2. For each stationary period k < n, in a backward order, 1. Sample a position x k given x k+1 from the probability vector proportional to f k • 1 x k+1 T k , where the operator 1 x k+1 is used to extract the row of T k corresponding to x k+1 . The forward vector f k is assumed to be already computed from Section 2.3.2.

| Wind analysis
The resulting position estimates are precise and accurate enough in relation to the spatial variation of wind to allow for studying wind conditions during flight. As ground speed, windspeed and airspeed have already been computed for each edge of the graph, we can easily extract and summarize their distributions from the randomly simulated arg max (3) trajectories (Section 2.3.3). In addition to ground speed, airspeed and windspeed, we compute the distribution of wind support and drift by calculating the windspeed projection along each individual flight's orientation (wind support) and perpendicular to it (drift).
We qualitatively assess the relationship between each flight in terms of distance and orientation, as well as the direction and magnitude of wind. In addition, we characterize the overall effect of wind on the full trajectory by integrating the ground speed, windspeed, airspeed, wind support and drift over all flight durations to quantify the total displacement caused by each one.

| Trajectory outcomes
The most likely trajectory, the marginal probability maps, and the simulated trajectories are illustrated in Figure 2  shows an irregular shape (i.e. not ellipsoidal) due to the pressure threshold and land mask (e.g. stopover in Italy for 18IC). The uncertainty is generally significantly smaller for long stationary periods (e.g. stopover in northern Tunisia for 18IC). The most likely trajectory connects each stationary period by finding the optimal compromise between generally minimizing the flight distance while preserving the stopover location in the most likely area. The most likely trajectory is expected to produce a realistic full trajectory, which is not guaranteed by alternatives such as using the average or median position of the marginal probability at each stationary period. The simulations, on the other hand, are useful to illustrate all the possible trajectories that the bird might have taken and thus visually represent the uncertainty in the overall trajectory.

| Wind analysis
Using the high-resolution trajectories produced, we can quantify the speed and direction of wind experienced during each flight. wind support tends to be associated with longer flights and longer distance travelled. Northward migration (i.e. temperate spring except for intra-African migrants) is performed in fewer and longer flights as birds experience stronger wind support than during southward migration.
Interestingly, most detours in the trajectory can be explained by birds drifting while following the direction of strong supporting winds, such as the detour via Libya by 18IC, or the detour via western Algeria by 16DM, both during pre-breeding migration.
As illustrated above, wind can help birds reach their destination faster, but also cause lateral drift. We quantified the displacement due to the wind (windspeed) and due to the bird's own power Summing the displacement due to wind and airspeed for all flights in spring and autumn separately, we can compare the influence of wind between season and species. Except for the Eurasian Nightjar, long-distance migrants tend to be more efficient at maximizing wind support while minimizing drift compared to short-distance migrants ( Figure 4b). In autumn, drift tends to be larger, and wind support smaller, than in spring while the overall distance travelled is greater.

| Model strengths
In this paper, we present an approach to estimate the full migratory trajectory of a bird equipped with a lightweight geolocator.
We model the trajectory as an HMM using a trellis graph, which, thanks to its compact format, allows us to efficiently build and store the full probability distribution of the trajectory. The model allows us to make inferences about the trajectory -such as computing the most likely trajectory or marginal probability maps -in In addition, the graph structure allows us to efficiently account for wind data to refine the possible distance covered by a bird, and ultimately improve the accuracy and precision of the trajectory.
Compared to Werfeli et al. (2022), the computation of wind is further improved by integrating the variation of windspeed over time, space, and altitude encountered throughout the flight.

| Data requirements
This approach was developed and optimized for data collected from lightweight pressure sensors equipped on small birds. However, it could be applied to other datasets, under certain conditions. The observation model can incorporate any data that provide a position estimate. For instance, it can be based on light data only, possibly requiring a lower grid resolution. The approach can also integrate external information, such as field observations of the equipped F I G U R E 4 (a) Distance travelled by the Great Reed Warbler (18LX) propelled by wind (dotted line) and its own airspeed (continuous line) for each flight. The longest flight is highlighted in red to illustrate that the impressive ground speed (135 km/h) is mostly explained by the high windspeed (81 km/h). Similar maps for all 16 tracks are available in Supporting Information 3. (b) Illustration of the wind triangle of distance (rather than the usual wind triangle of speed) for spring (left) and autumn (right) migration. The total distance travelled (i.e. sum of all individual flights) (dot ~ ground speed) is the vectorial sum of the wind (dotted line ~ windspeed) and bird power (continuous line ~ airspeed). This representation allows to visually appraise the relative influence of the bird's drift (x-axis) and wind support (projection of the dotted line on the y-axis). bird or data from archival GPS or telemetry. The method does not require knowledge of the position at the first or last stationary period (e.g. 22BK); however, this information can drastically reduce the size of the trellis graph. Additionally, the data collected by the device must provide the information needed to identify stationary periods and estimate flight duration. In the case of bird migration, this can be achieved using pressure measurements alone thanks to their high-altitude flight (e.g. Rhyne & Nussbaumer, 2022;Rime & Nussbaumer, 2022). Accelerometer data can be particularly helpful for this task because of its high temporal resolution. Wind data provide a facultative but significant improvement in the accuracy of possible flight distance. In the absence of wind data, the movement model (Section 2.2.5) should be replaced by a parametric equation of the ground speed (e.g. Briedis, Beran, et al., 2020).
Beyond birds, this approach could in theory be used to model the trajectory of other animals, provided they alternate between short periods of movement and extended periods of stationarity, where the animal can be assumed to remain in the same position relative to the grid resolution. Indeed, the modelling approach relies on the location of the bird being discretized in space and time over a finite number of stationary periods. This is typically not the case among most marine wildlife nor mammals.
For this approach to be applied successfully in other studies, the importance of high-quality labeling of pressure cannot be overstated (Nussbaumer et al., 2023), as a small labeling error could result in erroneous trajectory estimation. To avoid this, we recommend using the R Shiny app GeoPressureViz (https://rapha elnus sbaum er.com/ GeoPr essur eManu al/geopr essur eviz.html), which helps researchers to visualize the full trajectory of the bird and validate the labeling and overall coherence of the likelihood maps with the movement model.

| Wind analysis
With precise position estimates for each stationary period and the high-resolution windspeed database, we can estimate the speed and direction of wind experienced by a bird with relatively high confidence. Consequently, wind support, airspeed and wind compensation can all be quantified on an individual level for small-bodied passerines. Our preliminary results from these 16 tracks qualitatively illustrate the significance of wind strength and direction in explaining flight distance, duration, and even the migration trajectory. As the objective of this paper is to describe the method used to reconstruct the trajectory, we do not investigate further ecological research questions here.

| GeoPressureR
Recent years have seen a growing number of studies using multisensor geolocators to track small-bodied passerine migrants (e.g. Liechti et al., 2018;Meier et al., 2018;Sjöberg et al., 2021). To assist researchers in applying this method to their own study, we developed the R package GeoPressurer (https://rapha elnus sbaum er.com/GeoPr essur eR/) to (1) compute positions based on pressure (Nussbaumer et al., 2023), (2) build the graph and (3) compute the four outputs of this study. The package is accompanied by the user guide GeoPres sureManual (https://rapha elnus sbaum er.com/GeoPr essur eManu al/) providing step-by-step explanations using the example of the Great Reed Warbler (18LX). Furthermore, the GitHub template repository GeoPressureTemplate (https://github.com/Rafnu ss/GeoPr essur eTemp late) helps researchers kick-start their study with a pre-built folder structure, R code, and an automatically generated website report. Together, these tools aim to make the method described above accessible to all researchers and applied to a wide range of species, including birds and bats.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors have no conflicts of interest to declare.

PE E R R E V I E W
The peer review history for this article is available at https:// w w w.w e b o f s c i e n c e . c o m /a p i /g a t e w a y/ w o s /p e e r-r e v i e w/10.1111/2041-210X.14082.