NAPS: Integrating pose estimation and tag-based tracking

1. Significant advances in computational ethology have allowed the quantification of behaviour in unprecedented detail. Tracking animals in social groups, however, remains challenging as most existing methods can either capture pose or robustly retain individual identity over time but not both. 2. To capture finely resolved behaviours while maintaining individual identity, we built NAPS (NAPS is ArUco Plus SLEAP), a hybrid tracking framework that combines state-of-the-art, deep learning-based methods for pose estimation (SLEAP) with unique markers for identity persistence (ArUco). We show that this framework allows the exploration of the social dynamics of the common eastern bumblebee (Bombus impatiens). 3. We provide a stand-alone Python package for implementing this framework along with detailed documentation to allow for easy utilization and expansion. We show that NAPS can scale to long timescale experiments at a high frame rate and that it enables the investigation of detailed behavioural variation within individuals in a group. 4. Expanding the toolkit for capturing the constituent behaviours of social groups is essential for understanding the structure and dynamics of social networks. NAPS provides a key tool for capturing these behaviours and can provide critical data for understanding how individual variation influences collective dynamics.


Size Adjustment
Choosing an optimal window size in a matching algorithm is a key decision that influences the trade-off between fidelity (the ability to maintain accurate tracking of identities) and responsiveness (the ability to quickly respond to identity swaps).This is particularly relevant in the context of tracking multiple individuals in a video, where identity swaps and other confusions can occur, such as tracking bees in a colony.We use the Kuhn-Munkres algorithm for assigning identities based on detected ArUco tags on the individuals.In ideal conditions, correct assignments would be represented by the diagonal of the cost matrices C i,j (t) shown below having the lowest values, corresponding to correctly assigned identities.However, the ArUco tag identification is not always perfect.SLEAP may occasionally assign nodes to incorrect tracks, and tags can be misread or not read at all in frames due to occlusions or other issues.Therefore, the choice of window size can have substantial effects on the resulting assignments and their accuracy.
In our framework, we construct a cost matrix C i,j (t) from the binary matrix of ArUco tag and SLEAP instance coincidences, I i,j (t) for each frame t.Each element of the cost matrix is computed as the negative summation of the coincidences over an overlapping window of size 2w + 1 centered on the frame: We assign IDs to each instance by finding the minimum of the cost function using the Kuhn-Munkres algorithm as implemented in SciPy, represented by the following equation where A is a permutation matrix: Consider three different window sizes and their impact on the matching process: Small window size (3 frames; w=1).With a small window size, the algorithm can respond quickly to identity swaps, but it may also lead to an increase in incorrect assignments due to short-term ambiguities in tag readings.The cost matrix obtained may look like: Here, due to the small window size, the first instance is correctly assigned to the first tag, but the second instance is incorrectly assigned to the third tag and the third instance to the second tag.This results in a deviation from the ideal diagonal alignment and shows how small windows can lead to misassignments.
Medium window size (41 frames; w=20).Increasing the window size helps to mitigate the impact of missed tag readings but also delays the algorithm's response to true identity swaps.The cost matrix obtained may look like: Despite the missed tag readings, the larger window size allows for correct assignments as the correct identifications outweigh the missed tag identifications over the window of frames.
Large window size (101 frames; w=50).With a large window size, the algorithm shows high fidelity in identity assignments, even in the case of numerous missed tag readings.However, it can be slow to respond to rapid identity swaps.The cost matrix obtained may look like: Despite a substantial number of incorrectly attributed tag readings, the large window size ensures correct assignments as the correct identifications far outweigh the missed identifications over the large window of frames.

Figures
Figures

Table S1 .
This table presents a comparison of multi-animal tracking software.The "Direct CNN-based Identification" column denotes software's use of a trained Convolutional Neural Network (CNN) to directly identify individuals.The "Marker-based Identification" column indicates the software's reliance on external markers for identifying animals."Pose Estimation" represents a software's ability to predict the pose of an animals' body parts beyond a single point or blob features.NAPS is unique among these in that it uses marker-based identification and provides postural data.