Application of a novel deep learning–based 3D videography workflow to bat flight

Studying the detailed biomechanics of flying animals requires accurate three‐dimensional coordinates for key anatomical landmarks. Traditionally, this relies on manually digitizing animal videos, a labor‐intensive task that scales poorly with increasing framerates and numbers of cameras. Here, we present a workflow that combines deep learning–powered automatic digitization with filtering and correction of mislabeled points using quality metrics from deep learning and 3D reconstruction. We tested our workflow using a particularly challenging scenario: bat flight. First, we documented four bats flying steadily in a 2 m3 wind tunnel test section. Wing kinematic parameters resulting from manually digitizing bats with markers applied to anatomical landmarks were not significantly different from those resulting from applying our workflow to the same bats without markers for five out of six parameters. Second, we compared coordinates from manual digitization against those yielded via our workflow for bats flying freely in a 344 m3 enclosure. Average distance between coordinates from our workflow and those from manual digitization was less than a millimeter larger than the average human‐to‐human coordinate distance. The improved efficiency of our workflow has the potential to increase the scalability of studies on animal flight biomechanics.


INTRODUCTION
Étienne-Jules Marey called motion the most apparent characteristic of life. 1 As such, our ability to capture movement is one of the most important tools in comparative biomechanics.Indeed, advances in this field have gone hand in hand with technical advances in videography. 2,3][9][10] The reliance on manual digitizing inherent to this approach in combination with the high framerates of modern cameras and expansion of studies to higher numbers of trials results in an exceptionally large workload for video processing.The use of human digitizers, therefore, limits the scalability of experiments.Consequently, there is a pressing need to streamline the motion to data pipeline, especially in the case of 3D datasets with high degrees of occlusion, to reduce processing time and increase our ability to analyze larger datasets. 4tificial intelligence (AI)-based methods for measuring animal behavior, including locomotor movement, have recently made great strides.For example, the widely used Python package DeepLabCut (DLC) 11 has been successfully applied to humans, 12 rats, 13 cheetahs, 14 macaques, 15 chimpanzees, 16 spiders, 17 and more. 18In contrast, DLC has not been widely applied to animals in flight. 3For bats in particular, published research using DLC is limited to the movement of neural recording devices attached to horizontally climbing, socially interacting bats, 19 and the intralaryngeal dynamics of sound production. 20erefore, to our knowledge, deep learning-based tracking methods have seen little to no application for the study of animal flight.
Bats present a distinctive challenge for AI-based motion tracking, as well as manual tracking.Flying animals can present a wide array of orientations and distances to the camera leading to substantial variation in their appearance in video, which in turn makes consistent tracking a difficult task.Bats are especially challenging because of their highly compliant wings that can assume a variety of forms 21 with up to 50 degrees of freedom due to their numerous joints. 22These factors not only further increase the variation in appearance in footage, but also often lead to wing configurations where parts of the wings are obscured or difficult to tell apart.In laboratory settings, these challenges have often been mitigated by applying markers to anatomical landmarks (e.g., Refs.7, 23, and 24).While this method has proven useful, it is not always feasible or desirable, particularly when studying bats in more natural environments or in field-based flight arenas.In such settings, minimizing interference with the animals is a high priority due to time constraints and a desire to minimize stress to the animals.Together, the factors described here make AI-based tracking of bats one of the more difficult use cases for AI-based motion tracking.
While DLC supports the processing of videos from stereo-camera setups, 18 its 3D capabilities are not yet as comprehensive as those of some other applications such as DLTdv, 5 which allows for a greater number of cameras, simultaneous digitization of frames from multiple cameras, and more flexibility in calibration procedures. 25Conversely, though DLTdv offers some capabilities for training and utilizing deep learning models, DLC offers a larger selection of underlying neural networks, has better support for training against larger datasets pulled from many individuals, and has a larger user community for markerless pose estimation than DLTdv (T.L. Hedrick, personal communication,   2024).Both DLC and DLTdv are widely used tools-DLC more so in the neuroscience community and DLTdv for the study of biomechanics. 3wever, to our knowledge, no prior efforts have succeeded in inte-grating the robust 3D capabilities of DLTdv with the machine learning power of DLC for efficient and precise studies in animal biomechanics.
Here, we describe a cohesive workflow that combines DLC and DLTdv for biomechanical studies of bat flight.We use DLC to train deep learning networks that detect 16 anatomical landmarks on the bat in each frame.We use DLTdv and its built-in functions for the manual digitizing required to train the deep learning network, 3D calibration, and reconstruction of 3D points.In addition, we developed custom MAT-LAB functions that utilized the DLC confidence score and the DLTdv reprojection error to rectify correct but mirrored landmark DLC predictions (e.g., mistaking the left wingtip for the right wingtip and vice versa).All custom scripts used to integrate these tools, along with deep learning networks and training data, are provided as Supplementary Material (see section "Code and data" in Supporting Information).
We use two case studies to assess the effectiveness of our workflow.The first case study compares a traditional workflow of humans digitizing trials of bats flown in a wind tunnel with markers versus our new automated workflow applied to the same bats flown under the same conditions but without markers.The second case study examines more complex maneuvers of bats flying in a large outdoor flight arena.
We also compare how different processing steps affect the accuracy of results, including data augmentation and our approach to filtering of 3D tracks through the integration of DLC and DLTdv confidence metrics.Broadly, we show that our novel markerless approach generates results that have similar accuracy to purely manual approaches, while dramatically reducing the processing time of videos and eliminating the need for applying markers to animals.
As a measure of the potential of our workflow to reduce the labor of manual digitization, we would like to highlight that for this study, we achieved over 16,000 frames digitized using our automatic workflow from 1100 manually digitized frames of training data.Based on the digitizing pace of a trained student and an 8-h workday, we estimate that we achieved the equivalent of around 39 days of manual digitizing from less than 6 days spent digitizing.It is our hope that this workflow, along with the accompanying code and bat-specific deep learning networks, will benefit future studies by substantially reducing the need for additional training data and thereby facilitating further advances in animal flight research.

Overview
Our workflow consists of using DLC 11 to detect landmarks in 2D in video frames (Figure 1A,B) followed by applying functions from DLTdv 5 for reconstructing the 3D positions of the landmarks.We reject bad detections from individual cameras based on low DLC detection confidence or poor 3D residuals of candidate points (Figure 1B,C), and based on the same parameters, we correct instances where the network incorrectly swapped positions of paired symmetrical points, such as wingtips (Figure 1C).Lastly, we apply a final set of processing steps to generate smoothed trajectories of each anatomical landmark (Figure 1D), from which further kinematic parameters can be Repeating this procedure for each frame generates 3D trajectories of anatomical landmarks, enabling kinematic analysis.Shown here are resulting wingtip trajectories as well as a triangular mesh depicting a bat reconstructed from landmark 3D positions at seven points in time.Although only wingtip trajectories are shown here, trajectories are generated for all 16 anatomical markers (see Figure 6A).Abbreviation: DLC, DeepLabCut.calculated.The custom code for this project was written in MATLAB (version R2022b Update 2, MathWorks) and includes scripts for 3D reconstruction, calculations of wing kinematic parameters, and quantifying 3D accuracy.Code related to DLC, such as creating projects, creating networks, predicting 2D coordinates, and analyzing 2D accuracy was written in Python (version 3.8.13).We describe the workflow in greater detail below and then describe how we measure the accuracy of the workflow in two case studies (see section "Case studies").

Labeling the dataset-DLTdv
For each case study, we labeled video frames and separated them into a training dataset and a test dataset.The training dataset is used to train DLC networks, and the test dataset is used to evaluate its performance.
We digitized training and test videos using DLTdv 5 (DLTdv8a, version 8.3.1, 2022) rather than DLC's built-in digitization features because of its robust feature set for multicamera digitization and integration with 3D calibrations generated with easyWand.We only digitized clearly visible anatomical landmarks, meaning if a landmark was occluded, for instance, by the wing, we did not digitize it.We used custom MAT-LAB code to convert DLTdv output files into DLC-compatible files (see GitHub repository linked to in section "Code and data" in Supporting Information).To reduce video memory usage during DLC and to reduce the size of the training dataset, the function we wrote for generating the training frames allows for cropping the output frames around the area containing the bat.However, cropping may not be required if sufficient GPU video RAM (i.e., VRAM) is available.

Deep learning-DeepLabCut
For each case study, we used DLC 11 to train deep learning models on detecting the positions of 16 anatomical landmarks (see colored dots in Figure 1B and labels in Figure 6B).The training was done on a total of 1062 training frames for case study 1 and 1078 training frames for case study 2 as described above on a workstation PC (Dell OptiPlex 7071; GPU: NVIDIA GeForce RTX 2080, 8GB VRAM, OS: Windows 10, processor: Intel i9-9900, 3.10 GHz).For both case studies, we used DeepLabCut version 2.3.0,initial weights ResNet-50, a batch size of 2, the Adam optimizer, augmenter type imgaug, and a global scale factor of 1 (compared to default of 0.8) (see Table S1 in Supporting Information).
For case study 2 (free flight enclosure), the effective frame size of the videos was comparatively small (420 × 420 pixels on average) because of dynamic cropping around the bat in a more zoomed-out view compared to case study 1 (596 × 511 pixels on average).We, therefore, lowered the DLC position distance threshold 26 from the default 17 to 10 pixels.Furthermore, to lessen the propensity for leftright swapping in DLC, a tendency we describe further below, we used the flip left-right augmentation fliplr, 27,28 and changed the default rotation augmentation from the default maximum of 25 • to a maximum of 180 • (see Ref. 29 for more details on using the fliplr augmentation).
The full DLC configurations are described in the section "DeepLabCut details" in the Supporting Information, and the full DLC projects are available in the Dryad repository linked to in section "Code and data" in the Supporting Information.
As mentioned above, the training data were cropped around the bat in the videos.To ensure the videos were analyzed consistently with the training data, we also cropped the videos around the bat.For case study 1, the bats were flying steadily in the image frame, and we cropped the videos using PFV (Photron Fastcam Viewer, version 4.0.4.0, build 742, Photron Limited).For case study 2, the bats were moving over a large area of the frame, and we thus cropped dynamically around the bat using the MATLAB application ThruTracker, 30 which outputs the cropping parameters needed for 3D reconstruction.

Reconstruction and filtering of landmark positions
For the manually digitized landmarks, we used MATLAB functions from DLTdv, available in the DLTdv GitHub repository, 31 to reconstruct the manual landmark digitizations and obtain 3D positions.For the automatically digitized landmarks, we used custom algorithms that take DLC's specific features into account.For example, we rejected low-confidence DLC predictions and implemented an algorithm to correct left-right swapping of bilateral anatomical landmarks (Figure 2), a common issue with DLC predictions in our case studies.
For both human and automatic digitizations, we generated potential 3D coordinates for each landmark from 2D predictions using all cameras and subsets of cameras (Figure 2B).We then compared camera combinations based on the reprojection errors, rejecting those likely to contain erroneous digitizations in one or more cameras and selecting the camera combination with the lowest reprojection error.
To ensure that the algorithm does not excessively favor fewer cameras when DLC predictions are accurate in all cameras, we apply a penalty, meaning we artificially increase the reprojection error for using fewer than all cameras prior to choosing the winning camera combination (see Table S2).For automatically reconstructing distal points, we also apply a custom algorithm, inspired by Ref. 32, that considers DLC confidence and DLTdv reprojection error to correct for occasional left-right confusion (e.g., DLC sometimes confusing the left for the right wingtips or other symmetrical markers; see Figure 2C).
For each, we defined an acceptability threshold and a veto threshold, which when not met vetoes the acceptability measures.Therefore, if two or more measures met the acceptability threshold and none failed the veto threshold, the 3D coordinates were accepted; if fewer than two measures met the acceptability thresholds, or if one failed the veto threshold, the 3D position was not accepted.Thresholds varied between case studies (see Table S2 in Supporting Information for values used and additional clarification of thresholds).In case study 1, 3D positions of the anatomical landmarks from both manual and automatic digitization were smoothed.In case study 2, only every 50th frame was manually digitized for the test trials, thus only 3D trajectories resulting from automatic digitization were smoothed.In both cases, we smoothed using a combination of robust local regression and Butterworth filtering.
In summary, this method accounts for a combination of DLC con-

Case studies
In the first case study, we compared the kinematics of Carollia perspicillata at three speeds from steady flight in a large wind tunnel.We repeated this experiment with and without markers applied to anatomical landmarks on the bats.For trials with markers, human digitizers labeled each point manually; for trials without markers, we used our novel workflow combining DLC and DLTdv.Our workflow allowed us to compare kinematics generated from the same individual bats flying under controlled conditions using a traditional workflow with markers and human digitizers with our new automated workflow applied to bats flying without markers (Figure 3A).We wish to emphasize for clarity that the nature of the comparison-manually digitizing videos of bats with markers applied versus automatically digitizing videos of bats without markers applied-means that we are not comparing the two methods applied to the same set of videos.The two methods producing similar kinematic values and similar variance would support the rigor and usability of the new method.We do not, however, expect identical results from the two approaches because of the natural variation among trials.

F I G U R E 2 Overview of how we use DeepLabCut (DLC) confidences and DLTdv reprojection errors to select or reject DLC predictions. (A)
Flowchart depicting how our process uses initial per frame and landmark DLC image coordinate predictions, confidence of predictions, and resulting DLTdv residuals from each camera to select winning combinations to reconstruct anatomical landmarks' 3D positions.Landmarks treatment depends on location at the body midline versus distally, with DLC prediction of paired distal landmarks considered in unison.(B) Example of how the DLTdv residual (reprojection error) is used to reject erroneous predictions.Numbers followed by px denote the resulting residuals, measured in pixels, of the different camera combinations.In camera two, the position of the focal anatomical landmark (the nose) is erroneously placed too caudally; because of this, camera combinations that include camera two place the landmark's 3D position more caudally than those that do not use this prediction, resulting in high residuals (red, crossed-out, boxes).The camera combination that excludes camera two, therefore, has the lowest residual and is selected as the winning combination (green, not crossed-out box).When comparing the reprojection errors among camera combinations, a penalty is applied for using fewer than all cameras to avoid biasing the algorithm to favor combinations with fewer cameras.(C) Example of integrated consideration of right and left wingtip predictions to find correct matches of paired points.This example uses only two cameras for simplicity, but the principle holds for any number of cameras.Numbers followed by "%" represent DLC confidences of predictions, and numbers followed by px denote the residuals, measured in pixels, of camera combinations.Note that DLC has predicted left and right incorrectly in camera 2. Because camera 1 has higher confidences and because the DLC assigned left-left and right-right combinations results in high residuals (red, crossed-out, boxes), we deduce that DLC has assigned the actual left wingtip as the right one, and vice versa, in camera 2, and we, therefore, correct for this, and arrive at the optimal camera combinations correctly assigned as corresponding to the left and right wingtip (green, not crossed-out boxes).Abbreviations: DLC, DeepLabCut; NaN, not a number.
In our second case study, we recorded three bat species (Dasypterus intermedius, Lasiurus borealis, and Nycticeius humeralis) flying freely of their own volition from their housing within a large L-shaped flight enclosure at the Austin Bat Refuge.Bats were not handled by the experimenters so the exact number of individuals included in the study could not be accurately counted.No markers were applied to the bats (Figure 3B).We chose this setting to reflect a less controlled environ-ment where bats were free to execute a variety of maneuvers such as turning and pursuing prey.In this case study, the same trials were digitized by two human labelers, as well as by using our automated DLC-DLTdv workflow.This resulted in human and automatically generated 3D points of the same 16-point set of landmarks for each trial.
Here, the success of the novel method is indicated by the similarity in the average distance between automatically generated 3D points and F I G U R E 3 Overview of the testing procedures for our two case studies.(A) For case study 1, we studied individual bats with and without markers applied to anatomical landmarks and compared the kinematics yielded from manually digitizing the markered trials to that generated by our automated workflow applied to markerless trials.(B) For case study 2, bats were not handled and no markers were applied.We compared the results from manually and automatically digitizing to the same set of markerless trials.human-generated points to the average distance between two human labelers.
In both case studies, the performance of our workflow is compared to that of a single human digitizer per trial.This is not to suggest that a human digitizing represents the cutting edge in terms of accuracy, but rather to make the comparison relevant to the majority of studies in the field of animal biomechanics (see section "Literature review of typical number of digitizers in animal kinematics studies" in Supporting Information).S2a for a schematic of test section and approximate placement of cameras and lights).We chose each camera angle to capture a different perspective of the bats' flight, providing a comprehensive view of their movements and making it possible to capture the complexity of the bats' wing movement.We calibrated the focal volume for 3D analysis by filming a checkerboard pattern, automatically detecting the positions of square corners using the MATLAB function detectCheckerboardPoints, and using the checkerboard's diagonal vector as the wand distance in EasyWand. 25,37The resulting calibrations were further improved by incorporating the digitized wingtip landmarks as background points in EasyWand, leading to wand scores (standard deviation of reconstructed wand lengths divided by the wand length) below the suggested maximum of 1 (0.56 and 0.24) for the 2 days of videography.
We aimed to elicit three flights per treatment per speed for each study subject (18 trials per individual, 72 trials in total).We reached our goal with two out of four study bats (2 × 18 trials).Two bats were reluctant to fly steadily at 6 m/s; one bat did so for three trials with the marker treatment, but only twice for the markerless treatment (17 trials), and another bat did so only once per treatment (14 trials).Thus, 67 usable trials were elicited in total, 34 with markers, and 33 without.Because experimental subjects such as bats, as well as a diversity of other animals, sometimes fail to perform locomotor tasks optimally during data collection, a reasonable rule of thumb is to train one and a half to twice as many individuals as required for an intended study sample.
We visually inspected each flight's videos to identify the most stable wingbeat (i.e., the wingbeat during which the bat changed its position in the test section of the wind tunnel the least).A wingbeat period was defined as the period between two consecutive maxima of the wingtip in the dorsoventral axis.The wingbeat prior to the one selected for analysis was used as training data.Specifically, every 10th frame of that wingbeat was digitized in DLTdv and converted into DLC training data.

Analysis
The output of the 3D reconstruction analysis consists of 3D trajectories for the anatomical landmarks calculated for each trial, speed, and method.The 3D trajectories were transformed into a bat body-centric coordinate system.We then used linear mixed-effects (LME) modeling to test the effect of the digitization method on the resulting anatomical landmark position over the wingbeat.The model included normalized time as a fixed nonlinear term, modeled with natural splines to allow flexibility in capturing time-related changes, with degrees of freedom selected from 1 to 15 based on the lowest Akaike information criterion score.Individual and trial numbers were included as random effects.
Fixed effects for speed and method were also included.The model was fitted using maximum likelihood estimation.Due to singularity issues encountered when including trial number as a random effect for the ankle landmark in the z dimension, this variable was excluded from the model for this specific analysis.This decision was made to ensure model convergence and reliability of the estimates.For visualization, the wingbeat periods were standardized to a 0%−100% range, such that the time point of the first frame of the wingbeat is 0% and that of the last frame of the wingbeat is 100%.The wingbeats were then grouped by speed, spatial dimension (x, y, and z), and digitization method (manual digitization with markers applied and automatic digitization without markers).For each group, a standardized wingbeat was formed by calculating the mean position over time (see Ref. 38 for an example of constructing standardized wing strokes) as well as the 95% confidence interval over time (Figure 4).In addition, we tested if the variance of the landmark positions over the wingbeat differed depending on the digitization method using a Levene's test on the residuals of the landmark positions from the LME models.
We also calculated six derived wing kinematic parameters for each wingbeat to test the functional difference between the two methods of digitization.The derived wing kinematic parameters were chosen to form a representative group of parameters often calculated in the literature (e.g., Refs.39-41) and to incorporate different sets of anatomical landmarks.These parameters were also grouped by speed and digitization method and then compared using LME models with digitization method and speed and their interaction as fixed effects and individual as a random effect.We recorded a total of 223 flight events.We discarded all but 35 trials for use in analysis after removing flights where bats were either not visible in all three cameras or out of focus in one or more cameras.From the remaining 35 trials, 10 were randomly discarded to reduce labor, leaving 25 for inclusion in the final dataset.Among these, the most common species was N. humeralis with 11 trials.We randomly selected six of those trials as testing data and used the remaining 19 trials as DLC training data; see section "Motion capture" below for a clarification of testing and training data.For each test trial, every 50th frame was digitized manually, resulting in approximately one digitized frame per wingbeat.The digitization was performed twice by two different human digitizers to allow for quantifying human-to-human variability.
We then applied our workflow to all test videos and compared the results to that of one of the human digitizers, selected at random.

Analysis
The output of the reconstructions comprised 3D coordinates for anatomical landmarks obtained from both manual and automatic digitization.Since each trial was manually digitized by two different F I G U R E 4 Position of anatomical landmarks in the bat body-centric coordinate system in case study 1 (with origin at the sternum landmark position) organized by speed and dimension (x: craniocaudal, y: proximodistal, z: dorsoventral) for normalized wingbeat periods starting and ending at maximal upstroke.Teal shows results from manual digitizing; magenta from automatic processing (mean ± 95% confidence interval).Note that the range of values shown in vertical axes differs between plots.Cartoon bat at the bottom left shows the approximate directions of the x, y, and z axis.For location of the tracked anatomical landmarks on the wing, see Figure 6B.individuals, we could calculate the average human-to-human error for each anatomical landmark.To determine the average humanto-automatic error, we compared the 3D positions obtained from one human digitizer, selected randomly per trial, to those generated by automatic digitization using our method.We then compared the human-to-human error statistically to the human-to-automatic error using a two-sample t-test with unequal variance.In addition, we investigated how the human-to-human error and the human-to-automatic error varied with an average distance between the bat and the three cameras via LME modeling.

Statistical analysis
Statistical tests were performed in MATLAB and R. The statistical modeling of the anatomical landmark positions for case study 1 was performed in R (version 4.3.2,2023-10-31 ucrt), using RStudio (2023.12.1 Build 402, Posit Software, PBS).The remaining statistical testing was conducted in MATLAB.When multiple testing was performed on related data (dependency of anatomical landmark position in the x, y, and z dimension on digitizing method and speed and dependency of kinematic parameters on digitizing method and speed in case study 1, and 3D accuracy per landmark in case study 2), p-values were adjusted to account for false discovery rate 42 using MATLAB function mafdr with the "BHFDR" flag set to true.Based on an analysis of the data using LME models, only two out of 18 of the x, y, and z coordinates for the six anatomical landmarks were significantly different between automated and manual approaches (mean difference ± standard deviation = 1.57± 1.44 mm; Table S3).Speed was a significant factor in four of 18 positional coordinates (Table S3).To test whether human versus automated methods had different variances, we applied Levene's test on the residuals of each coordinate from our LME models.This resulted in six coordinate variances that were not significantly different between automated and manual approaches, six coordinates with significantly higher variance for automated methods, and six coordinates with significantly higher variance for manual methods (Table S4).Overall, this indicates that manual and automated methods provided similar estimates of anatomical landmark coordinates in nearly all cases and that the two F I G U R E 5 Wing kinematic parameters by speed in case study 1.Teal represents human digitizers; magenta represents digitization using our automatic workflow.Error bars show 95% confidence intervals.Abbreviations: AoA, angle of attack; DS, downstroke; mid DS, middle of the downstroke; US, upstroke.See Table 1 for corresponding numerical values.approaches had similar amounts of measurement error, as indicated by the variance in measurements after factoring out other confounding variables using LME.

Case study 1: Steady flight in wind tunnel
We also compared the values of six wing kinematic parameters: wing stroke amplitude, wingbeat frequency, stroke plane angle (during up-and downstroke), angle of attack at the middle of the downstroke, and wing area at mid-downstroke (Figure 5).LME modeling showed that the digitization method (automatic versus manual digitization) had no significant effect on mean values for five out of six parameters.
For wingbeat frequency, both the digitization method and the interaction of speed and digitization method had a significant influence (Table 1).For four out of six derived kinematic parameters, speed had no significant influence.However, the stroke plane angle increased sig-TA B L E 1 Means of parameters (means with 95% confidence intervals in square brackets) and p-values for speed, digitizing method, and the interaction of the two from mixed linear effects comparison between measurements of kinematic parameters derived from automatic and manual digitization in case study 1. nificantly with speed during both up-and downstroke (see Table 1 for an overview of differences between methods and Table S5 for comprehensive statistical details and see Movie S1 for an example of a trial from case study 1 digitized using our workflow).

Case study 2: Free flight enclosure
The mean distance between 3D positions derived from manual versus automatic digitization across all landmarks for our final workflow was 3.11 mm (95% CI [2.90, 3.31]), compared to 2.45 mm (95% CI [2.16, 2.73]) mean difference between 3D points generated by two humans (Table 2).Therefore, on average, humans performed less than 1 mm better (0.66 mm), a small but statistically significant difference (t = −3.68(1067.4),p < 0.001, two-sample t-test).For four anatomical landmarks (5th digit, elbow, ankle, and nose), human-to-human variation was significantly smaller than human-to-automatic variation by an amount of 1.04−2.26mm (Table 2).Overall, error rates were quite low for both human and automatically generated points compared to the scale of the bat (forearm length 34−38 mm, 43 wingspan 211 mm).However, some points, such as the sternum, showed higher error than other points such as the wingtips and 5th digit (Figure 6).
Both human-to-human and human-to-automatic variation increased with an average distance between the bats and the cameras, meaning accuracy was lower when the bat was further away from the cameras (t = 12.29(1161), p < 0.001, LME model).This relationship was more pronounced for the human-to-human variation (t = −2.36(161), p = 0.02, LME model) (Figure S4).
In terms of pixel errors of the DLC predictions from the network used in our final workflow, the average human-to-human distance was significantly smaller for all anatomical landmarks, except for the sternum, where the average human-to-DLC distance was significantly smaller (two-sample t-test, see Table S6 in supplement for detailed differences and p-values).Averaged over all predictions, the humanto-human distance was 1.66 pixels (95% CI [1.57, 1.75]) and the human-to-DLC distance was 3.17 pixels (95% CI [2.87, 3.46]), meaning the average human-to-human distance was 1.51 pixels smaller.
Accuracy of our workflow showed sharp improvements from the initial DLC network that we trained to our final workflow, which approached the accuracy of humans (Figure 7).Predicted landmark 3D coordinates from the initial DLC network that we trained without flip left-right augmentation, and where we did not apply our custom filter or correct for erroneous left-right swapping, had an average error for all landmarks of 18.68 mm (95% CI [15.13, 22.22]), and an average error for the wingtip landmarks of 50. ) for the wingtips.As seen, the largest improvement in accuracy came from the addition of flip left-right augmentation; however, high error rates for some paired points such as wingtips remained.
Our final workflow, which incorporated additional processing steps to fix left-right swapping of paired points, yielded a small improvement in overall accuracy, but a larger improvement in the wingtips (see Movie S2 for an example of a trial from case study 2 digitized using our workflow).

TA B L E 2
Distance between landmarks placed by two different human digitizers, and distance between landmarks placed using our automatic workflow and one randomly chosen human digitizer per trial for each anatomical landmark in case study 2. Note: Shown numbers are means with 95% confidence intervals in parenthesis.Columns labeled "n" show the number of reconstructed landmarks the calculations are based on.Slight differences in n stem from instances where one human digitizer did not deem a landmark visible enough to label, whereas the other did.Bold indicates statistical significance.a Average of the per frame difference between the human−human difference and the human−auto difference for that landmark; negative values indicate that the human−auto difference was larger than the human−human difference for that landmark.

DISCUSSION
We developed and tested the effectiveness of a workflow that leverages the strengths of DLC and DLTdv for automatic tracking of the 3D positions of anatomical landmarks on bats in steady wind tunnel and free flight.For steady flight in a wind tunnel, we show that humans manually digitizing bats with markers applied to anatomical landmarks yield flight kinematics that, for the most part, are qualitatively and quantitatively similar to those generated by our automated workflow for tracking landmarks on unmarked bats (Figures 4 and 5).One exception was that wingbeat frequency, as well as its relationship to speed, differed depending on the digitization method in case study 1.The results are, however, derived from different sets of trials, meaning that no matter the accuracy of digitization, we do not expect identical results.Even so, we inspected the wingtip trajectories that the wingbeat frequencies were calculated from and found no indication of faulty digitization.Furthermore, both the LME modeling of the wingtip coordinate position in case study 1 and the accuracy of wingtip digitization as measured in case study 2 indicate that the tracking of the wingtip using our workflow is on par with human digitization.We thus conclude that the observed difference in wingbeat frequency is likely reflective of a real-world difference between the two trial groups.
In our second test, we show that our automatic workflow generates highly accurate reconstructions of bats executing complex maneuvers in a large outdoor flight arena.The 2D predictions from our final DLC network, which utilized the flip left-right augmentation, had an average error of 3.17 pixels.This was 91% larger than the average pixel distance between labels placed by two different human digitizers (1.66 pixels; Table S6).Despite this, after applying our workflow, overall, mean error for all tested landmark position predictions was 3.11 mm, which was only 0.66 mm, or 27% greater than for humans (Table 2).
This demonstrates the robustness of our 3D workflow for minimizing the cascading errors resulting from 2D data with small to moderate degrees of inaccuracies.
In addition, regardless of the digitizing method, the accuracy of digitization was negatively affected by the average distance between the bats and the cameras.This decrease in accuracy likely results from bats being less well-illuminated and smaller in the frame when they are further away from the cameras.Although the difference was small, the decrease in accuracy with distance to cameras was more pronounced  2 for corresponding numerical values.
for humans than for our workflow, indicating that DLC may be more robust to worsening image quality than our human digitizers.
Together, the results from our case studies show that our process of combining deep learning pose estimation models with well-known computer vision tools for 3D reconstruction can produce highly useable 3D kinematic data for biomechanical studies of flying animals such as bats.We conclude that the approach demonstrated here provides accuracy on par or only slightly worse than traditional manual techniques for quantifying animal flight kinematics, that it works without applying markers to animals, and that it requires considerably less processing time than traditional approaches, especially for larger datasets.
Three-dimensional reconstruction accuracy improved substantially with additional processing steps (Figure 7).Using DLC alone, even with optimized network settings, provided relatively poor results.Applying a flip left-right augmentation improved accuracy considerably, but results were not on par with human digitizing.This was especially true for wingtips, which are key markers for biomechanical analysis.
This was caused by left-right swapping, where the network mislabeled paired landmarks for each other (e.g., the right wingtip for the left wingtip and vice versa).We solved this problem using a combination of DLC prediction confidence, 3D reprojection errors of points reconstructed with all available camera combinations, and considering how 3D points move over time (Figure 2).These approaches largely solved the left-right swapping problem and provided results with errors that did not differ significantly from human accuracy for seven of the 10 tracked landmarks, and which were otherwise between 1.3 and 2.3 mm larger than the corresponding human error for an overall difference in error of 0.66 mm (Table 2).
3][14][15][16][17][18] During flight, bats can be filmed from all directions in both the horizontal and vertical planes.Their highly articulated wings lead to landmarks that are frequently occluded by other body parts.The unique anatomy of bat wings that allows for many degrees of freedom and the high compliance of the wing membrane lead to an exceptionally large range of 3D wing shapes in flight. 22Even small differences in wing position and configuration are often of interest for biomechanical study, so high 3D reconstruction accuracy is required for many analyses.Data collected in less controlled environments, such as for case study 2, typically contain more variation than data from a controlled lab setting.Therefore, it should not be surprising that custom processing steps were required to acquire satisfactory results for this study.However, the benefits of the approach demonstrated here extend beyond bats to other flying animals and other systems in which animals change body or limb shapes to a high degree and in which left-right swapping may be a problem because of issues of viewing angle.
A major benefit to using DLC is that it does not require the use of markers.This results in less work for experimenters, less stress for the animals being handled, and the ability to track the movements of animals in fully natural settings.The results from case study 1 show that markers are not needed to produce accurate automatic tracking.However, a limitation to a markerless approach is that less visually distinct anatomical landmarks, such as the sternum, tend to show higher error rates (Figure 6 and Table 2).Other points that we did not track, such as the finger joints, would likely have a similar problem.Researchers may want to track the finger joints, for example, to measure wing camber.
In this situation, researchers may consider adopting a mixed approach combining markerless and markered tracking.Retraining neural networks using additional training data of this nature should provide excellent results with this approach.
Recent studies have demonstrated improvements in leveraging the nature of multicamera setups with deep learning models.For example, Lightning Pose and Bkin-3D have provided approaches that better use 3D data for unsupervised and partially supervised training of deep learning networks. 44,45Anipose provides an integrated 3D workflow that is in some ways similar to the workflow presented here, although it uses different approaches such as using Kalman and Viterbi algorithms for resolving errors in 2D detections. 46A detailed comparison of the performance of these approaches relative to the workflow presented here is beyond the scope of this study; however, important future research in this area will be to provide tools for integrating the best aspects of these and other emerging computational techniques.We believe the strength of the workflow demonstrated here is its ability to bridge two widely used programs: DLTdv for its robust multicamera digitization platform and algorithms for 3D calibration and reconstruction 5,25,32 and DLC for its deep learning detection algorithms in 2D. 11,18liance on manual digitization is not only time-consuming, but also prone to inconsistent quality due to the fatigue that can result from the repetitive and mundane task of digitizing.A major aim of this work is, therefore, to dramatically reduce the time required for manual digitization of videos of flying bats and other animals in future studies, thereby increasing the efficiency and scalability of investigations into animal flight.We timed a highly trained student manually labeling bats according to the digitizing scheme used in this study (i.e., 16 landmarks).On average, manual labeling took 70 s per frame and camera.For motion capture with three cameras at a framerate of 500 fps for a wingbeat frequency of 10 wingbeats per second, a single wingbeat requires close to 3 h to digitize.With this approach, a multispecies (e.g., five species) study of continuous flight, 15 trials per species, and five wingbeats per trial would take close to 7 months of full-time labor for manual digitization.By contrast, we used around 1100 frames of training data per case study in this project, a little more than 1 week of full-time labor, which resulted in networks that we used in conjunction with our workflow to analyze over 16,000 frames of video in total.For this project alone, it means that we replaced around 39 days of full-time labor spent digitizing with around 5.5 days of digitizing training data.This is a sevenfold reduction of manual labor spent digitizing.But since the training data can, and will, be reused for future projects, the true gains in terms of time and energy will be significantly larger; gains that we hope will allow for easier scaling of future videography-based bat research to include larger numbers of individuals, trials, and species.
The results presented here were achieved without improving DLC performance by digitizing frames from the analyzed videos for which accuracy was poor (i.e., without refining the labels). 18That is, in using this workflow for other projects, we would inspect the pertrial performance, manually digitize sections where prediction errors occurred, and convert those digitizations into training data before retraining the network.This iterative process of inspecting and improving would likely result in higher accuracy.As a test of this principle, we retrained our DLC network after adding training data consisting of every 10th frame of a complex bat-flipping maneuver to accurately track a maneuver that the network had not seen previously (Movie S3).

F I G U R E 1
Overview of the workflow.(A) The bat is filmed with multiple synchronized cameras, calibrated for 3D reconstruction.(B) Videos are analyzed in DeepLabCut (DLC), resulting in predicted image coordinates and confidences of the predictions for anatomical landmarks in each frame.Low confidence detections are filtered out.(C) Image coordinates of the anatomical landmarks are reconstructed into 3D coordinates using the camera calibration and DLTdv.Filtering of inaccurate 2D predictions and correction of left-right swapping of bilateral landmarks such as wingtips (see Figure 2B and C) are guided by DLC prediction confidences and DLTdv reprojection errors.(D) fidences and DLTdv residuals to combine information from multiple 2D detections of anatomical landmarks into 3D coordinates of the landmarks while filtering erroneous predictions and correcting for left-right symmetrical swapping of anatomical landmarks in individual cameras (see sections "Further details about 3D reconstruction and filtering of landmark positions," "Smoothing of 3D trajectories,"and "Filtering parameters used in the case studies" in Supporting Information).

Case study 1 :
Steady flight in wind tunnel The experiments took place in the Animal Flight and Aeromechanics (AFAM) Wind Tunnel Facility, Prince Engineering Labs, Brown University.We trained four Seba's short-tailed bats, C. perspicillata, to fly steadily in the AFAM wind tunnel test section, which has a volume of 1.2 m × 1.2 m × 1.4 m.We chose this species because of their availability in the laboratory, reliable flight in a wind tunnel, and common use in previous studies of flight biomechanics. 8,33-36Study subjects (n = 4) were of the species C. perspicillata from a captive-bred colony.All bats were housed with conspecifics, separated by sex, in 1.8 m × 2.4 m × 2.1 m mesh cages (mesh size: 12.7 mm × 12.7 mm) at Brown University in the Center for Animal Resources and Education under approved Institutional Animal Care and Use Committee protocol 19-01-0012.The temperature of the enclosures ranged from 24.4 to 26.7 • C, and the air humidity ranged 60%−70%.Bats were given access to food and water ad libitum.The bats' diet, formulated with veterinary guidance, consisted of a specialized wet food blend (incorporating monkey chow, calcium, multivitamins, peaches, nectar, corn syrup, and water) and assorted fresh fruits (including bananas, apples, melons, and pears).To facilitate daytime experiments, the bats were housed under reverse light/dark cycle conditions, with 12 h of light followed by 12 h of complete darkness, in compliance with USDA regulations.The subjects were acclimated to the wind tunnel in training sessions, around twice per week for 6 weeks, at varying wind speeds.One researcher was present in the wind tunnel downstream of the bats during their flights and would catch the bats by hand at the end of their trials before returning them to their home cage.The bats flew with and without markers applied to 16 anatomical landmarks (Figure 3A) at three different speeds (3, 4.5, and 6 m/s), while we obtained video of their flight with four high-speed cameras filming at 700 frames per second (Phantom Miro 340, Vision Research) equipped with 18−55 mm zoom lenses (Nikon, 2.8f).We arranged the cameras around the focal volume where the bats flew, with one filming obliquely from the lower right, one obliquely from the lower left, one obliquely from the top right, and one vertically from above.The test section was illuminated with four white lights (Veritas Constellation 120), each approximately tracking the direction of the closest camera (see Figure

Case study 2 :
Free flight enclosure To determine how well our automated workflow performs on a dataset of bats without markers executing complex behaviors in a naturalistic setting, we filmed individuals of northern yellow bats (D. intermedius), eastern red bats (L.borealis), and evening bats (N.humeralis) at the Austin Bat Refuge in June 2020, using three frame-synchronized, highspeed cameras (One Fastec TS5 and two Fastec IL5), sensitive to both visual and near-infrared light, and equipped with 20 mm lenses (Nikkor 20 mm f/2.8 Manual Focus) at 800 frames per second (fps) with a resolution of 1280 × 1024 pixels.We illuminated the scene with three triple-panel infrared (850 nm) illuminators (Raytec VAR2-I8-3 Long-Range Triple-Panel Semi-Covert IR Illuminator).The cameras were mounted on top of the infrared light panels such that the center-panel and the camera faced approximately the same direction.All cameras were aimed obliquely upward from one side of the flight arena, and the focal volume was calibrated using a custom version of EasyWand that used digitized landmarks on the bats and the distances between the cameras, rather than a wand, as calibration objects.This resulted in an average reprojection error of 0.33 pixels.The bats flew freely in a large L-shaped enclosure (long side 14.1 m, short side 9.1 m, width 6.1 m, height 3.3 m, see Figure S2b for schematic of flight enclosure and approximate placement of cameras and lights).As the bats flew, they performed turns unprompted, feeding on moths and navigating their enclosure, while we triggered recordings opportunistically.

Figure 4
Figure 4 shows the coordinate data (x, y, and z rows, i.e., craniocaudal, proximodistal, and dorsoventral, respectively) for manually and automatically tracked points on the wing and body at three speeds (3, 4.5, and 6 m/s columns, respectively), averaged across all trials in speed and treatment combination.The dark teal lines indicate mean results of manual tracking, while the magenta lines show results using our automatic method.Shaded regions indicate 95% confidence intervals.
U R E 6 (A) Example trajectories of the automatically digitized landmarks of a bat performing a banked turn from right to left in case study 2. A reconstructed bat based on landmark coordinates is shown at the end of the trajectory.Bold lines are scalebars of 20 cm.Cartoon bat shows location of tracked anatomical landmarks with color-coding for trajectories.(B) Average distance between landmarks placed by two different human digitizers (teal) compared to distance between landmarks placed using our automatic workflow and a human digitizer (magenta) for each anatomical landmark.The diameters of the circles represent the distances, scaled by half the wingspan of the bat (10.55 cm).Note that the ventral side of the bat is visible in the image.White circles are 1−5 mm for reference.See Table

F I G U R E 7
Comparison of accuracy of DeepLabCut (DLC) networks plus subsequent processing steps compared to human-human variation in case study 2. Data points denote the mean distance between landmark coordinates resulting from manual digitization by one randomly selected human per trial and automatic digitization in 3D space, or for the rightmost datapoint (i.e., human), the mean distance between coordinates from manual digitization by two humans.An error of zero would mean an exact match between compared coordinates.DLC baseline denotes the result from using a DLC network trained without flip left-right augmentation.flip l.r.aug.added shows results from the network with the same settings but using the flip left-right augmentation (fliplr in DLC).Final workflow denotes the error resulting from using our complete workflow, including custom filtering and left-right swap fixing based on quality metrics from both DLC and DLTdv.Error bars show 95% confidence intervals.Abbreviations: DLC, DeepLabCut; flip l.r.aug, flip left-right augmentation.
Drawing on the strengths of DLC and DLTdv, our workflow successfully delivers highly accurate automatic 3D tracking for the study of bat flight biomechanics.The demonstrated performance, within 1 mm of human digitization accuracy, opens the door for more efficient and reliable quantification of complex bat behaviors.By significantly reducing the time and effort required for data processing, our method paves the way for large-scale studies on bat flight behavior in both controlled and natural settings.In addition to providing a blueprint for how to apply deep learning models to the study of animal flight, we provide training data and initial network weights that can be used as a basis for DLC networks for automatic tracking of bat flight kinematics.We hope that future studies will also share digitized data.This can profoundly help the future development of bat-specific deep learning networks which can be generalized to diverse species and study sites and can also serve as valuable benchmark datasets.The recent development of SuperAnimal models, as demonstrated in Ref.47, presents an exciting opportunity for advancing the potential of old and new datasets of digitized bat flight videos to further the study of bat behavior and biomechanics.A key strength of SuperAnimal models is their ability to combine highly generalized training data with different numbers and positions of landmarks, enabling researchers to create deep learning networks that can track various landmark configurations.The development of a SuperAnimal model specifically for bats (a SuperBat model) could leverage this capability to combine datasets from multiple studies with differing digitizing schemes.Moreover, the integration of a SuperBat model with our workflow combining DLC and DLTdv could lead to even greater accuracy and efficiency in quantifying bat flight biomechanics, with minimal additional training data needed per study.In a field where the painstaking detail of manual digitization has long been a limiting factor, robust deep learning workflows can serve as a catalyst to transform the landscape of videography-based flight research.

Manual Adjusted p-value speed Adjusted p-value method Adjusted p-value interaction
Note: See TableS5for additional statistics.Bold indicates statistical significance.