Automatic detection of migrating soaring bird flocks using weather radars by deep learning

The use of weather radars to detect and distinguish between different biological patterns greatly improves our understanding of aeroecology and its consequences for our lives. Importantly, it allows us to quantify passerine bird migration at different scales. Yet, no algorithm to detect soaring bird flocks in weather radar is available, precluding our ability to study this type of migration over large spatial scales. We developed the first automatic algorithm for detecting the migration of flocks of soaring birds, an important bio‐flow phenomenon involving many millions of birds that travel across large spatial extents, with implications for risk of bird‐aircraft collisions. The algorithm was developed with a deep learning network for semantic segmentation using U‐Net architecture. We tested several models with different weather radar products and with image sequences for flock movement identification. The best model includes the radial velocity product and a sequence of two previous images. It identifies 93% of soaring bird flocks that were tagged by a human on the radar image, with a false discovery of less than 20%. Large birds such as those detected by the algorithm pose a serious risk for flight safety of civilian and military transportation and therefore the application of this algorithm can substantially reduce bird‐strikes, leading to reduced financial losses and threats to human lives. In addition, it can help overcome one of the main challenges in the study of bird migration by automatically and continuously detecting flocks of large birds over wide spatial scales without the need to equip the birds with tracking devices, unravelling the abundance, timing, spatial flyways, seasonal trends and influences of environmental conditions on the migration of bird flocks.


| INTRODUC TI ON
It is increasingly recognized that weather radars, which are primarily designed to monitor meteorological processes for weather forecasts, detect trillions of insects, bats, and birds in the air (Gauthreaux et al., 2008). Consequently, weather radars around the world are exploited to better understand different broad scale behaviours and movement of aerial organisms in detail, including quantification of biomass fluxes (Dokter et al., 2018;Farnsworth et al., 2016;Hu et al., 2016;Nilsson et al., 2019;Van Doren & Horton, 2018) and mapping of stopover sites along migration flyways (Buler & Dawson, 2014;Cohen et al., 2021;Schekler et al., 2022). In addition, extracting biological data from weather radars allows us to manage human-wildlife conflicts such as flight safety (Ginati et al., 2010;Kranstauber et al., 2022;Van Gasteren et al., 2018), crop damage (Markkula et al., 2008), risks due to collision with wind energy facilities (Cohen et al., 2022), and the dispersal of pathogens (Acosta et al., 2021) and pollinators (Wotton et al., 2019). These efforts minimize financial consequences, provide economic incentives, decrease risks to human lives and conserve aerial animals (Bauer et al., 2017).
Collisions between birds and aircrafts are a serious aviation hazard, costing billions of dollars annually (Allan, 2000;Anderson et al., 2015) and can have detrimental consequences for human lives, mainly during low-level flights, including take-off and landing (Van Gasteren et al., 2018). In 2002, it was estimated that at least 350 people had been killed due to bird-aircraft collisions worldwide (Sodhi, 2002). High body-mass bird species are more hazardous to aircraft (Dolbeer et al., 2000), and therefore large birds such as vultures, cranes, pelicans and eagles have the highest potential of causing severe damage (Anderson et al., 2015;Dolbeer et al., 2000).
For example, in the collision database of the Israeli Air Force, which includes data from 1968 to the present, with over 6700 documented collisions (data not published), the collisions are divided into three categories: minor, medium and severe strikes. The minor category includes damage in the range of 0-50,000 US dollars, without human injury, while the medium and severe strikes include damage of 50,000-2,000,000 US dollars and may include human injury and mortality. Inspection of this database revealed that about 80% of the medium and severe strikes of the Israeli Air force have been caused by large birds (body mass > 200 g).
Bird migration is a world-wide phenomenon where each spring and fall, billions of birds migrate between breeding and nonbreeding regions around the world. There are two basic flight strategies of migratory birds over land. The first is by flapping, which is done mainly by small birds such as most passerines and waders (Hedenstrom, 1993;Newton, 2008), primarily during the night.
These birds tend to spread across the migration flyway and commonly migrate over the sea. The second is by soaring-gliding flight, which is done mainly by large species with a relatively large wing surface area relative to their body mass, allowing them to utilize rising air currents during migration while saving energy (Hedenstrom, 1993;Sapir et al., 2010). These large birds migrate primarily during the day when updrafts are available, usually avoiding flight over the sea and often do so in concentrated streams where topography favours the development of updrafts, (Newton, 2008). Further, many of these birds tend to flock and may form aggregations of thousands of birds (Leshem & Yom-Tov, 1996a). While for passerine migration, automatic algorithms have been recently developed and applied (Buler & Moore, 2011;Dokter et al., 2011), resulting in significant progress in our knowledge regarding this migration (Cabrera-Cruz et al., 2018;Cohen et al., 2021;Dokter et al., 2018;McLaren et al., 2018;Nilsson et al., 2019;Rosenberg et al., 2019), no algorithm has been developed yet for automatic detection of soaring bird flocks.
Israel is located next to the largest marine ecological barrier in the Palearctic-Afrotropical bird migration system, the Mediterranean Sea, as well as next to another large water body, the Red Sea. Because soaring birds depend on updrafts for their migration, they tend to avoid long sea crossings (Newton, 2008) resulting in high densities and diversity of soaring birds migrating through Israel (Shirihai, 1996). For some species, the entire world (Lesser Spotted Eagle and Levant Sparrowhawk) or Palaearctic (White Pelican) population passes over Israel during migration and for other species (White Stork and Honey Buzzard) a high percentage of the population flies through the country (Leshem & Yom-Tov, 1996a;Shirihai, 1996). While Israel is a very small country (the 46th smallest country out of 195; United Nations Statistics Division, 2021), it has a very active air force (in terms of flight hours; FlightGlobal, 2022).
This results in an intense conflict due to the high risk of collisions between birds and military aircrafts. Countries, like Israel, that implemented migration warning systems have shown considerable decrease in the number of bird-aircraft strikes over the last few decades (Van Gasteren et al., 2018). Bird strikes can roughly be divided into local (during take-off and landing) and en route (on other lowaltitude flights) risks (Van Gasteren et al., 2018). Locally, the populations of birds on airfields can be controlled or manipulated to some extent in order to reduce the collision risks. However, this is impossible for en route risks that involve populations of migratory birds. Therefore, to reduce the risk of bird strikes, birds and aircraft must be separated by dynamically detecting bird distribution and providing near real-time warning to aircraft to avoid hazardous areas.
The aim of this study is to develop an automatic algorithm that will identify soaring bird flocks in weather radars. We used a deep learning method for segmentation (U-Net) and tagged thousands of images to create a database for training and testing the model.

Although weather radars have been used in ornithological research
for several decades, to the best of our knowledge, this is the first study that distinguishes soaring bird flocks in weather radar echoes.

| Radar data
We obtained raw data from two single-polarization weather radar stations operated by the meteorology unit of the Israeli Air Force at Mt. Meron (MER) in Northern Israel and Mt. Aricha near the town of Mitzpe Ramon (RAM) in the south of the country (Figure 1). Due to the proximity of these two stations to international borders, MER radar also covered some areas in Lebanon and Syria while the range of RAM radar also included parts of the Egyptian Sinai Peninsula ( Figure 1). The radars emit C-band electromagnetic waves and record returned signals, which include a set of three gridded data products including reflectivity factor, radial velocity, and spectrum width (WRADH). Reflectivity factor is proportional to the power of the received signal of objects in the atmosphere, radial velocity is computed from the Doppler spectrum frequency shift of targets within the radar sampling volume and provides information about the velocity of the targets, and spectrum width is a measure of the variability of Doppler velocities within the sampling volume (Doviak & Zrnić, 2006). The polar resolution is ~125 m in range and 1° in azimuth. In the years of the study (2018-2019), the radar reflectivity product was highly contaminated by clutter and other radar transmissions in the surrounding area, and therefore the reflectivity of bird flocks was barely visible in this product. Instead, we used the radial velocity product, as preliminary analysis of the detection of flocks using this product showed that the pattern of migrating flocks was the clearest. We used bioRad package version 0.5.2  to project the polar volume scan to a georeferenced Cartesian grid in the form of a plan position indicator (PPI) with a range of 50 km from the radar.
Soaring birds, such as birds of prey, storks and pelicans, have been measured flying at altitudes of several hundreds of meters up to several kilometres above ground level (AGL; Kerlinger & Gauthreaux, 1985;Leshem & Yom-Tov, 1996b) with average migration heights of 640 m AGL (Kerlinger & Gauthreaux, 1985). Both RAM and MER are relatively high (868 and 1214 m above sea level, ASL, respectively) and the radars are positioned high above their surrounding areas. In the range of 50 km from each radar, the average elevation of the ground is 320 m ASL in MER (548 m below the radar height) and 412 m ASL in RAM (802 m below the radar height). Therefore, we used the lowest positive tilt angle for optimally detecting the flocks in the radars' ranges (0.0 or 0.4 as the screening protocol changed between different operational modes and years due to changes in radar settings; we used the lowest positive angle available).

| Flock detection
First, we wanted to be able to manually detect and distinguish patterns of migrating soaring bird flocks from other targets detected by the radars, such as wide-front passerine migration, ground clutter, and rain clouds. For this purpose, we used citizen science reports from two main sources: (1) eBird (https://ebird. org/israe l/home) reports from Israel and (2)  We used the online open source data labelling tool LabelStudio (https://label stud.io/) and tagged the soaring bird flocks by tagging the area of the flocks with a brush (and not by pixel). Flock identification was done through a sequence of several images that facilitated the detection of flock movement. Therefore, the tagging process identified the flocks by their spatial pattern and their movement, and the tagging was done separately in each image.
The tagged images were considered the ground truth for the training and the testing steps of the algorithm development.

| Training and testing data
In addition to bird flock identification, it was important to us that the algorithm could distinguish soaring bird migration from other patterns in the radar images such as rain clouds and passerine migration. Because soaring birds depend on thermals for migration, rain clouds tend to suppress migration (Newton, 2008) and therefore we had only a few examples of rain clouds in the images containing soaring birds, and usually they were relatively small clouds. To create data to train the model (see below) that F I G U R E 1 Satellite imagery of the study area with the weather radars (green triangles), their 50-km-radius coverage (white circles) and international borders (black lines). MER, Mt. Meron; RAM, Mitzpe Ramon.
would include additional images of rain clouds, we added images containing rain clouds from radar data collected during December 2019, before December, rain in Israel is limited. In addition, passerine and soaring bird migration occur at different parts of the day. Soaring birds migrate from about 2 h after sunrise to about an hour or two before sunset (Newton, 2006), while passerine migration typically ranges from sunset to sunrise. To include passerine migration in our model, we added images from spring and autumn 2018 mainly containing passerine migration.
The migration of flocks of soaring birds is not constant over time and intense migration usually takes place on only several days during the migration season (Leshem & Yom-Tov, 1996a).
Therefore, there are many days during the migration period when no specific patterns are evident in the radar images and we wanted the algorithm to include this option as well. Therefore, we added images that did not contain any specific patterns, selecting images from June 2019, after the end of the spring migration of soaring birds. In total we created 7509 images for the model. Consecutive images from the same day can be similar because flocks tend to concentrate in the same areas where there is strong uplift (Newton, 2008). To prevent the model from identifying birds by taking into account similar (consecutive) images, we divided the data into training and testing datasets by day. Therefore, the test images contained images from days that were not used previously for training. First, we randomly chose days for testing such that the testing dataset consisted 20% of the data, and the training dataset included the remaining 80% of the data. After choosing the best model, we estimated its performance by a 5 folds cross validation with proportion of 20%-30% of test data (depending on the number of images in each day). We used only the lowest positive scan for model training to facilitate bird flock detection as the radars were positioned high above their surrounding areas. To test the model on higher elevation scans, we tagged a few examples from a scan with a 1° angle of elevation. We did not find patterns of soaring migration birds on images from scans higher than 1° elevation.

| U-Net
In

| Trained models
We tested the performance of five different models. We started with radial velocity radar images, where the migration of the flocks was the clearest. In the second model we added the radar parameter WRADH as used in other deep learning methods for radars (Lin et al., 2019). While tagging the images, it helped to look at previous images for detecting the flock's movements. Therefore, in the third model, we used the radial velocity parameter and added the previous image but only if the time gap between the previous and current images was less than 7 min. In the fourth model, we used two previous images of the radial velocity (again, only if the time gap between both of the images was less than 7 min), and in the fifth model, we used three previous images.

| Evaluation method
First, we used three metrics to evaluate our model on a pixel level: Accuracy, F1 score and AUC. Accuracy is the fraction of the correct predictions from the total number of predictions. This measure can be biased by the distribution of classes, and in our case, because most or all of the pixels in an image were negative (without soaring bird migration), a naïve classifier that predicts all negatives will "perform" better. Therefore, other metrics are needed as well for better evaluating the model performance. The F1 score is the weighted average of precision and recall, where precision is the ratio of correctly predicted soaring bird flock observations (TP) to the total predicted positive observations (TP + FP) and recall (which is a synonym for TPR) is the fraction of true soaring bird flocks that is predicted by the model to be flocks (TP/TP + FN). FPR is the fraction of "not soaring bird flocks" that is predicted to not be flocks (FP/FP + TN) and ROC curve plots TPR versus FPR for varying probability thresholds (Fawcett, 2006). The AUC metric summarizes the area under the ROC curve and aggregates a measure of performance across all possible classification thresholds (Fawcett, 2006). In addition to the common evaluation methods by pixel, we were interested in evaluating how many flocks of migrating birds the model correctly identified. The tagging method was not by pixel but by a brush which marked the contour of the flocks (Figure 3). Therefore, to assess the performance of our model, after getting first results from the pixel level metrics, we evaluated the models on a contour level. For this purpose, we compared the contours we tagged to the contours found by the model. We compared the centres and the areas of the contours in the ground truth and the predicted images, but because one flock of birds can be tagged as a single long line or as few smaller dots (Figure 3), we allowed the following margins during the comparison: First, we defined a size criterion by which we allowed the contours to vary between one third of the size of the compared contour and three times the size of the compared contour. Second, we defined a location criterion by which we allowed the centre of the contour to be in a square of 30 pixels to each direction from the centre of the compared contour (the size of each image was 256 × 256 pixels). For example, if the model predicted a contour (soaring bird flocks) with a centre 10 pixels to the side of a contour in the ground truth image, and with a size twice that of the contour in the ground truth image, we defined it as a contour predicted correctly. For the evaluation, we counted how many contours were correctly identified (TP) in each image, how many contours the model did not predict (FN), and in addition, how many contours the model predicted that were not in the ground truth image (FP). At the contour level, we do not have a TN because the area that is not a contour (not predicted as soaring bird flocks) cannot be counted as such. Consequently, we used two other metrics to evaluate our model at the contour level:

| RE SULTS
The best model for detecting soaring birds is the fourth model which used the radial velocity parameter and two previous images (Table 1). This model had the highest AUC and TPR scores, and recognized 93% of the labelled soaring bird flocks when comparing all the models with the same test data. After choosing this model, we run this model with a 5 folds cross validation, and got TPR of 0.83 ± 0.1 and FPDR of 0.15 ± 0.15. The second-best model was the fifth model which included the radial velocity parameter and three previous images (Table 1). This model had the highest F1 score (but not significantly different from the fourth model) and the second F I G U R E 3 Example of two ways to tag the same image correctly from RAM radar. On (a) with small dots and on (b) with longer lines. We created a method to evaluate the success of the model by contours which considers both ways of tagging. RAM, Mitzpe Ramon. best TPR. As expected from our negative-biased data, all of the models were characterized by high accuracy scores of 0.98-0.99. Adding the WRADH parameter did not improve the model performance.
Examples of the predictions made by the chosen fourth model compared with the tagged ground truth images are shown in Figures 4 and 5 for the MER and RAM radars, respectively. For model training, we used only the lowest positive angle of scan because of the relatively high elevations of the radars. Soaring bird migration at higher elevations was harder to detect, and it was even harder to find consecutive images with patterns of soaring flock migration (the model needs three consecutive images). When we did find soaring bird migration patterns in the one-degree scans, they looked similar to flocks in days with scarce migration. Despite these limitations, we examined our model with higher angle scans and found that the model predicts the migration pattern with similar success as expected (Figures 4 and 5).

TA B L E 1
The performance of the different models at the pixel and contour levels. In the case of the best model (in bold), we ran it again with a five-fold cross-validation (the results in parentheses).

F I G U R E 4
Example of semantic segmentation results for the MER radar from the 28 August 2018 at 11:44 AM. On this day, many flocks of Honey Buzzards migrated through Israel. The left image is the mask ground truth with red marks of tagged soaring bird flocks over radial velocity radar image, the middle one is the estimated mask produced by the model with blue marks over radial velocity radar image, and the right image is the estimated mask produced by the model (blue) over the mask ground truth (red). The first row is for the 0° scan and the second row is for the 1° scan. MER, Mt. Meron.

| DISCUSS ION
The ability to detect and distinguish between different biological patterns that are created by various phenomena in weather radars has an enormous effect on our understanding of aeroecology and its consequences for our lives (Acosta et al., 2021;Bacciu et al., 2019;Cohen et al., 2022;Nilsson et al., 2021;Van Doren & Horton, 2018).
In this study, we developed the first automatic algorithm for detecting flocks of migrating soaring birds, an important biological phenomenon involving many millions of birds that travel across large spatial extents. Soaring birds migrate all around the world and therefore automatic detection of their migration with weather radars will allow us to study this amazing phenomenon by quantifying the birds' abundance and spatial and temporal distribution, as well as by exploring different factors affecting them. Importantly, large soaring birds pose the highest risk for flight safety (Dolbeer et al., 2000) and While tagging the images, in many cases, it was hard to distinguish the soaring migration pattern when considering only a single radar image, and the flock signal became clearer when looking at a sequence of images due to its movement. Using temporal information to improve models for extracting biological scatter from weather radars was also suggested by Lin et al. (2019) to improve MistNet performance. Therefore, it is not surprising that the two best models are the ones with two and three previous images in a sequence, allowing the machine to learn the pattern of movement which aids in the process of flock identification.
According to its performance, the best model is the fourth model (with the radial velocity parameter and two previous images). This model detected 0.83 ± 0.1 of soaring bird flocks that were detected by a human and has 0.15 ± 0.15 FPDR. It is nevertheless important to note that it is possible that some of the flocks that the model did find and that were not tagged, are not an error of the machine but rather were not detected by the person who tagged the images and thus missed these flocks. In addition, from the applied perspective Extending the success of deep learning-based image semantic segmentation techniques to the video domain has recently become a major focus in studies of computer vision (Wang et al., 2021). Further improvement of the algorithm may include deep learning methods for video semantic segmentation while creating a video from a sequence of images from the radar. In addition, this study uses polarimetric weather radars while in many western countries the common radar nowadays is a dual-pol radar. Using dual-pol weather radar products could substantially increase the performance of the model as algorithms that include dual-pol products are more efficient in identifying birds, precipitation, and ground clutter than unipolarimetric ones (Radhakrishna et al., 2019). The dual-pol radars emit and detect radio waves both in the vertical and the horizontal polarizations (Stepanian et al., 2016) which provides information regarding the object shape (height-to-width ratio) and uni-formity within a pulse volume. This information helps to discriminate different types of objects, including birds and insects (e.g. Melnikov et al., 2015;Stepanian et al., 2016) and improved the accuracy of separating different hydrometeor types (Kilambi et al., 2018;Ye et al., 2015). In addition, adding the reflectivity parameter to the model, which was not relevant in our case as a result of high noise of the parameters, can also improve the model results as most algorithms for detecting birds in weather radars use this parameter (Buler & Dawson, 2014;Dokter et al., 2011). We note that the algorithm we developed is used to detect soaring bird flocks during migration, and in Israel most of this type of migration involves raptors or large water birds such as cranes, storks and pelicans that are primarily soaring migrants.
Although the migration of geese and other large birds that are primarily flapping migrants is rather scarce in our region, this algorithm is probably capable of detecting the migration of flapping flyers that migrate in flocks as the pattern of this type of migration appears similar in radar images. Yet, it might require some fine-tuning of the model with tagged data.
The existing methods for researching soaring bird migration include mainly different kinds of telemetry, primarily using GPS tags (Fielding et al., 2022;Kumar et al., 2020;Vignali et al., 2022). These methods allow tracking of up to several dozen individuals from a certain population due to financial and logistical challenges. In addition, bird tagging might affect bird flight biomechanics and could hamper migration performance and overall fitness (Arlt et al., 2013;Costantini & Møller, 2013). Studying soaring bird migration using weather radars with the proposed algorithm can help overcoming one of the main gaps in the study of bird migration (Robinson et al., 2010) by simultaneously following many individuals over large spatial scales without the need to equip them with tracking devices.
Importantly, where wide-spread radar networks are available (e.g. most of U.S.), the tracking of flocks could be done over hundreds and even thousands of kilometres.
More importantly, the model we describe also provides a platform for the construction of predictive modelling of soaring bird migration, producing quantitative, spatially explicit forecasts by combining the results from this model with atmospheric parameters, as already done in the case of passerine migration (Van Doren & Horton, 2018). Such predictions may allow better use of the services migratory soaring birds provide and could reduce their negative consequences, such as collisions with aircrafts. Towards this end, our model can be used to quantify the characteristics of migrating soaring bird flocks, create long-term and large-scale monitoring tools, as well as forecast migration across continents (Bauer et al., 2017).
Our algorithm is the first model allowing automated quantification of flocks of soaring migrating birds. From previous observations (not published), we estimate that this algorithm can detect a flock of soaring birds from a size of about 30 individuals but furtherer validation is needed for accurate estimation of bird flock detectability in radar.
Birds of prey, which migrate by soaring, are one of the most sensitive group to wind turbine related mortality (Desholm, 2009), and because their migration tends to take place in a few peak migration days (Leshem & Yom-Tov, 1996a), a predictive model can mitigate this mortality by shutting down the turbines on these specific occasions. In addition, birds of prey suffer from illegal poaching, which poses the highest threat, in terms of percentage of the total population, to different raptor species (Brochet et al., 2019). Therefore, predicting days with intense migration can be an efficient way to substantially reduce raptor mortality by concentrating efforts for stopping illegal poaching to specific days. Data on the location and densities of soaring bird migration are essential for informed conservation of soaring migratory birds and for flight safety. Unfortunately, due to policy changes, access to radar data repositories that are essential for these analyses is becoming limited in Europe (Shamoun- , substantially hindering our ability to quantify biodiversity changes and identify their causes and consequences.

AUTH O R CO NTR I B UTI O N S
Inbal Schekler, Ilan Shimshoni and Nir Sapir conceived the study.
Inbal Schekler collected and processed radar data. Inbal Schekler, Tamir Nave and Ilan Shimshoni designed the methodology. Inbal Schekler analysed data. Inbal Schekler and Nir Sapir wrote the manuscript with input from all co-authors.

ACK N O WLE D G E M ENTS
We thank the Data Science Research Centre at the University of Haifa for supporting this study. Additionally, we thank Klil Giladi for tagging the radar images and the SPNI for the data from eBird.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

PEER R E V I E W
The peer review history for this article is available at https:// w w w.web of scien ce.com/api/g atew ay/wos/p e er-revie w/ 10.1111/2041-210X.14161.

DATA AVA I L A B I L I T Y S TAT E M E N T
The Data supporting the results is available via Zenodo at https:// doi.org/10.5281/zenodo.7466584 (Schekler et al., 2023)