Streamlining analysis methods for large acoustic surveys using automatic detectors with operator validation

Passive acoustic surveys are becoming increasingly popular as a means of surveying for cetaceans and other marine species. These surveys yield large amounts of data, the analysis of which is time consuming and can account for a substantial proportion of the survey budget. Semi‐automatic processes enable the bulk of processing to be conducted automatically while allowing analyst time to be reserved for validating and correcting detections and classifications. Existing modules within the Passive Acoustic Monitoring software PAMGuard were used to process a large (25.4 Terabyte) dataset collected during towed acoustic ship transits. The recently developed ‘Multi‐Hypothesis Tracking Click Train Detector’ and the ‘Whistle and Moan Detector’ modules were used to identify occasions within the dataset at which vocalising toothed whales (odontocetes) were likely to be acoustically present. These putative detections were then reviewed by an analyst, with false positives being corrected. Target motion analysis provided a perpendicular distance to odontocete click events enabling the estimation of detection functions for both sperm whales and delphinids. Detected whistles were assigned to the lowest taxonomical level possible using the PAMGuard ‘Whistle Classifier’ module. After an initial tuning process, this semi‐automatic method required 91 hr of an analyst's time to manually review both automatic click train and whistle detections from 1,696 hr of survey data. Use of the ‘Multi‐Hypothesis Tracking Click Train Detector’ reduced the amount of data for the analyst to search by 74.5%, while the ‘Whistle and Moan Detector’ reduced data to search by 85.9%. In total, 443 odontocete groups were detected, of which 55 were from sperm whale groups, six were from beaked whales, two were from porpoise and the remaining 380 were identified to the level of delphinid group. An effective survey strip half width of 3,277 and 699 m was estimated for sperm whales and delphinids respectively. The semi‐automatic workflow proved successful, reducing the amount of analyst time required to process the data, significantly reducing overall project costs. The workflow presented here makes use of existing modules within PAMGuard, a freely available and open‐source software, readily accessible to acoustic analysts.


| INTRODUC TI ON
Researchers are increasingly using bioacoustics to monitor remote marine ecosystems. Passive acoustic monitoring (PAM) has been used to study cetaceans for several decades using a range of approaches.
Mobile sampling, using towed arrays (e.g. Gordon et al., 2020;Rone et al., 2014;Thode, 2004) or gliders (e.g. Bittencourt et al., 2018;Cauchy et al., 2020) can provide designed coverage of larger survey areas. Charter and running costs for dedicated vessels can become a major budgetary component for towed array surveys. However, the use of automated data collection systems can allow towed PAM surveys to be carried out from platforms of opportunity with little human supervision of data collection. Platforms of opportunity are vessels at sea for other purposes which can allow ancillary data collection. Typically, vessel costs are already covered by the vessel's primary task, thus platform of opportunity surveys can be very cost effective. Using platforms of opportunity largely eliminates vessel costs but highlights the importance of cost-effective analysis of large passive acoustic datasets where the project budget is now dominated by analyst time rather than survey costs.
Long-term platform of opportunity projects and autonomous static recorders can collect datasets which extend over months or years. This is compounded by the high sample rates required to capture high frequency vocalisations such as the ~130 kHz echolocation clicks of harbour porpoise (Clausen et al., 2011;Villadsgaard et al., 2007) and Cephalorhynchus dolphins (Kyhn et al., 2009), further increasing the volume of data to process.
In the absence of automated detectors, data are typically processed by an analyst viewing a short-term Fourier transform (STFT) on a spectrogram display and listening to sections of interest. This takes considerable time, particularly for high frequency data where an analyst must listen in less than real time. Semi-automated analysis reduces the manual effort required from analysts, with the analyst's time being reserved for making the final decisions on targeted data.
Automation can also reduce biases and errors which often result with human analysts (Aide et al., 2013;Heinicke et al., 2015) and provide additional information from the data such as bearings calculated from time of arrival differences for signals received on multiple hydrophones. A suite of automatic processes are currently available for analysing acoustic data for a range of species, including click detectors and click classifiers (Gillespie et al., 2008;Madhusudhana et al., 2015;Miller & Miller, 2018), energy band comparisons (Klinck & Mellinger, 2011), extraction of spectral features (Gillespie et al., 2013;Lin & Chou, 2015) and more recently, machine learning methods (Bergler et al., 2019;Bermant et al., 2019;Jiang et al., 2018;Shamir et al., 2014). These methods differ in their computational requirements, performance and ability to process sounds from a range of species, the wider environment and anthropogenic sources. Density estimation methods based on distance sampling (Buckland et al., 1993) which were first developed for visual surveys, generally deal well with missed detections by directly estimating the probability of detection as a function of distance from the track line. As long as there is a high probability of detection along the survey track, missed detections (false negatives) at greater distances are of no consequence since the reduction in detections is measured by the reduction in the estimate of detection probability (González et al., 2018;Thomas & Marques, 2012). Marques et al. (2009), showed that acoustic data can also be used if a false positive rate is known, but this is likely to vary for different datasets based on the characteristics of interfering noise. Therefore, determining whether the analyst should examine all detections and remove false detections, or examine a subset and estimate the fraction of detections that were false positives, will depend on deciding a balance between endeavour and statistical robustness.
Sperm whales lend themselves well to acoustic surveys. Their loud clicks can be heard at distances of several km, and they can be tracked and localised from a moving vessel with very modest equipment using target motion methods. Several studies have published abundance estimates for sperm whales using standard line transect survey approaches (Lewis et al., 1998(Lewis et al., , 2007(Lewis et al., , 2018 or methods with 4. The semi-automatic workflow proved successful, reducing the amount of analyst time required to process the data, significantly reducing overall project costs. The workflow presented here makes use of existing modules within PAMGuard, a freely available and open-source software, readily accessible to acoustic analysts.
Both programs combine a simple transient click detector, combined with a sophisticated user interface, to enable an operator to efficiently select and group click trains on consistent bearings likely to come from an individual or closely associated group. The click detector generally produces many false positive detections, which may come from a variety of sources: other cetacean species, propellor and engine noise from the survey vessel and other craft, and other naturally occurring sounds such as breaking waves. It is therefore necessary for an operator to examine every screen page of data to eliminate false detections. However, by only displaying detections, the page length can be longer and much less cluttered than would be possible with a standard spectrogram, and additional information such as bearings to detected sounds are displayed. This makes it possible for an operator to scan data offline at many times real-time.
In this study, we took a total of 1,696 hr of continuous data collected from a hydrophone array deployed from a platform of opportunity while it made routine passages for other purposes. It is hoped that this opportunistic PAM data collection will be the start of a long-term project to collect PAM data on a wide scale, thereby contributing to world-wide cetacean population monitoring efforts.
Data were processed for the extraction of multiple classes of cetacean sounds including sperm whale echolocation clicks, broadband delphinid echolocation clicks, narrow band high frequency (NBHF) echolocation clicks and delphinid whistles. We report on the distribution of these sound types and provide detection functions for sperm whales and delphinids along the survey track. Importantly, we demonstrate how a carefully selected combination of automatic and manual processing allows for the time efficient processing of large datasets.

| Data collection
Acoustic data were collected opportunistically on-board M/V Arctic Sunrise during passages in the Atlantic, Southern, Arctic and Indian Oceans using a towed hydrophone array (Vanishing Point Ltd; Figure 1 Instrumentation Ltd) where analogue filtering and gain were applied before each channel was sampled at 500 kHz. High pass filters of 10 Hz and gain of 6 dB were applied to the 'medium frequency' channels 0 and 1, while a high pass filter of 2 kHz and gain of 12 dB applied to each of the 'high frequency' channels 2 and 3. Data from the SAIL acquisition card were written as four channel .wav files using PAMGuard (Gillespie et al., 2008) (available at www.pamgu ard.org), F I G U R E 1 Schematic of the towed PAM array and recording system used onboard M/V Arctic sunrise during acoustic surveys which also carried out real-time acoustic processing, displayed results and logged the ship location from GPS.

| Click processing
The raw wav data files were reprocessed onshore in conjunction with GPS data collected during the survey using PAMGuard (version 2.01.05).

| Detection and detector configuration
Odontocete clicks were detected on recordings from the high frequency hydrophones (channels 2 and 3) using a PAMGuard click detector module. Time of arrival differences for the signal on the two hydrophones were used to estimate an angle of arrival for each detected click relative to the hydrophone array.
To achieve a good compromise between detection efficiency and processing workload, an exploratory analysis was conducted on a representative subset of data from transits 1 and 2, to determine the PAMGuard click detector settings which enabled detection of all manually identified vocalisations while removing as many detections from noise sources as possible. This subset was representative of typical encounters identified during an initial pass through the data. The vessel propulsion system (propellor and engine noise) was the source of many false detections.
PAMGuard allows detections on a certain range of bearings to be vetoed. All detections in a 40-degree sector ahead of the vessel, with bearings between +20 and −20 were discarded. To further reduce false detections from background noise, a range of detector trigger thresholds between 10 and 19 dB in 3 dB steps were applied and a value was chosen where the maximum number of false detections were removed, while still retaining all but one odontocete click train. This optimised threshold was used to reanalyse all the recordings and timing, bearing and waveform information for the detected clicks were written to PAMGuard output files for further review and analysis.
Cetacean click vocalisations typically occur in trains, with fairly consistent and characteristic inter click intervals. Thus, trains of clicks, on a consistent bearing, are usually a more reliable cue than individual clicks. The 'Multi-Hypothesis Tracking (MHT) Click Train Detector' module within PAMGuard (Macaulay, 2020) was used to automatically group detected clicks into click trains. The module assesses the Inter-Click-Interval (ICI), amplitude, frequency content and bearing information of clicks to assemble putative clicks trains and then calculates the likelihood of being a true click train for every possible click combination. As more clicks are included in the model, the number of possible click trains increases exponentially and a pruning process is implemented so that only the most likely combination of clicks are retained in putative click trains. Each click train is given a 2 score. The lower the score, the more likely it is that the clicks within the train come from the same source or target animal. This process is computationally intensive, and while the pruning process increases the efficiency of the model (Macaulay, 2020), removing as many false positive clicks as possible before running the MHT click train detector proved essential.
Settings for the MHT click train detector were based on those suggested by Macaulay (2020) and adjusted iteratively for the data subset. The settings for the MHT click train detector and its classifier were then validated against a full manual analysis of the subset using existing MATLAB functions for PAMGuard (available at: https://github.com/PAMGu ard/PAMGu ardMa tlab) within custom MATLAB scripts (version 9.9.0, MATLAB, 2020).

| Classification
Following previous work to identify beaked whales in similar acoustic surveys (Keating & Barlow, 2013;Rone et al., 2014;Yack et al., 2010), two narrow band click classifiers with frequency sweeps were applied to detect beaked whales. The first using the PAMGuard defaults for beaked whales with a test band between 24 and 48 kHz, and the second higher frequency test band (40-80 kHz) to search for higher frequency beaked whale clicks. The presence of a frequency sweep, assessed by eye in Wigner plots of individual clicks was useful in identifying beaked whales. A narrow band classifier was also used to detect narrow band high frequency (NBHF) clicks, with a test band between 100 and 150 kHz, providing a classifier for any NBHF species such as harbour porpoise Phocoena phocoena, dwarf and pygmy sperm whales (Kogia spp.) and NBHF delphinids (e.g. Cephalorhynchus spp.). The classifier within the MHT module uses spectral template classifiers which correlate the average spectrum of each click train with species specific spectral templates and interclick interval parameters. Classifiers were run within the MHT click train detector for sperm whales, beaked whales and dolphins.

| Whistle processing
PAMGuard's 'Whistle and Moan Detector' (Gillespie et al., 2013) was run to detect odontocete whistle contours up to 24 kHz on the wav data files from the 'medium frequency' hydrophone pair (decimated to 48 kHz), using settings provided in Gillespie et al. (2013). The detector identifies tonal sounds within recordings using a multi-stage process which removes noise, calculates an FFT, applies an amplitude threshold and joins narrow band peaks in FFTs which are close in time and frequency to show 'whistle contours'.
Whistle contours were then classified to the species level using PAMGuard's whistle classifier (Gillespie et al., 2013). The classifier works by breaking up the detected contours into fragments of equal length before the mean frequency (Hz), frequency slope (Hz/s) and curvature (Hz/s 2 ) of the fragment are extracted. The distribution patterns of these parameters are calculated for whistles in encounters. The mean, standard deviation and skew of these parameters have been shown to vary between species (Gillespie et al., 2013).
Thus, whistles can be classified from multiple whistle fragments, by comparing distributions of fragments measured during acoustic encounters with distributions of contours from known species. Whistle classification was run separately for different geographical regions likely to have a different combination of 'whistling' species. In each region the classifier was trained using pre-existing and pre-labelled whistle contours of species likely found in that region. The training data were not necessarily collected from that region however.
Contours used in training, also used in Gillespie et al. (2013), had been sampled at 48 kHz with fragment length and section length parameters set at 30 bins (160 ms) and 60 fragments respectively.
Where an event could not be classified, due to an insufficient number of whistle fragments, a range of frequency metrics of each event, such as mean whistle frequency and mean whistle slope were extracted. This information was used to aid species identification.
Identification to the family level was attempted where species level identification was not possible.

| Manual audit
All sections of data that contained click train detections and/ or whistle detections were manually audited using the PAMGuard viewer displays and data map. The automatic classification of triggered click trains (e.g. sperm whale, beaked whale, NBHF, delphinid) was corrected where necessary. Echolocating odontocetes can most easily be distinguished on the bearing-time display of the click detector and click characteristics allow events to be placed into a species group. For example, using the PAMGuard Wigner plot for upsweep verification of beaked whale clicks (Papandreou-Suppappola & Antonelli, 2001;Yack et al., 2013). The MHT click train detector often fragmented a single click train into separate sections. In these cases, the analyst 'marked up' trains more accurately ( Figure 2).
As the whistle and moan detector can trigger on any tonal sound within its detection range, it was important to inspect detected contours to ensure only those from delphinids were included in later analyse and labelled using PAMGuard's 'Spectrogram Annotation' module. Delphinid click trains and whistles were merged where temporal overlap occurred into delphinid encounter events.

| Localisation
Click trains were localised using PAMGuard's Target Motion Analysis (TMA) module using the two-dimensional simplex method. This minimises the least squares error within a click train of bearings to a stationary location, estimating a different location for each side of the track. A simple two element linear array was used in these transits. The bearings calculated by a time of arrival difference actually place the target on a semi-circular arc passing beneath the vessel's track line. Fortunately, line transect surveys are quite forgiving for these ambiguities. It has been shown (Leaper et al., 1992;Lewis et al., 2018) that when the perpendicular distances to detections are typically greater than the likely depth of the detected animal, the 'vertical ambiguity' can be accounted for by the detection function and has little impact on F I G U R E 2 Bearing-time windows within PAMGuard showing a sperm whale encounter over a 30-min period. Window A shows the fragmented trains produced by the click train detector, with window B showing the same event after a manual revision and mark-up process to identify single click trains for each vocalising whale where possible. Due to the overlap of click trains, especially at the upper and lower ends of the bearing scale, it is not always possible to distinguish between vocalising individuals density estimation. Thus, for localisation purposes bearings can be considered as being horizontal. A left right uncertainty still remains, however, distance sampling methodology only requires a perpendicular distance from the track line (Buckland et al., 1993). Dolphins often bow ride and mill around the vessel. Clearly, target motion analysis cannot be used in these cases. However, a 'delphinid' detection function was calculated using a subset of detections which clearly moved past the array that had well-defined click trains.
Detection functions were calculated using data from all transits, as the same vessel and equipment were used throughout the study.
Half normal and hazard rate models were explored for each species group, and the best model for each selected based on Akaike information criterion (AIC) scores.

| RE SULTS
A total of 1,696 hr of four channel acoustic data were collected during more than 30,000 km of survey effort across the Atlantic, Southern, Arctic and Indian Oceans ( Table 1). This resulted in 25.4 Terabytes (TB) of 16 bit .wav files.

| Click processing
After exploratory analysis, a 16 dB click detection threshold was chosen over that of the 10 dB threshold used for real-time processing and other tested thresholds (13 and 19 dB) on the basis of the number of retained odontocete clicks and number of noise-originating clicks removed. The processed binary files produced by PAMGuard using a 16 dB compared to a 10 dB threshold were reduced in size between 78.5% and 99.6% for each transit ( Table 2)

| Whistle processing
PAMGuard's whistle classifier took approximately 62 min to process the entire dataset. The mean correct classification rate in the training dataset varied for each regional species group (64.1%-82.9%) due to the different mix of species included for each region. Confusion matrices showed that within the training dataset all species were correctly classified on most occasions. However, some species showed consistently higher (>20%) miss-classification rates across regional groups. For example, pilot whales were miss-classified as killer whales between 29.7% and 32.5% of the time.

| DISCUSS ION
The semi-automatic processes applied to the large acoustic dataset in this study used existing modules within PAMGuard. This methodology significantly improved acoustic analyst efficiency.
Semi-automated analysis methods, such as these, are essential for large-scale acoustic surveys of wild animal populations, where data may be gathered near continuously over months and can total tens of terabytes, and budgetary limitations mean that specialist analyst time must be used cost-efficiently.
We appreciate that the analyses of such data will never be completely optimised for a number of reasons. All detection systems suffer from false positive and false negative detections. While a false positive is easily defined, the definition of a false negative in an acoustic survey is much more complicated. Lowering detection thresholds in order to detect more distant sounds will inevitably lead to a high rate of false positives. False negatives will inevitably increase as detector thresholds are increased to remove false positives. However, distance sampling methodology, used in this study, generally deals well with false negatives (Thomas & Marques, 2012).
The detection threshold chosen for the click detector was appropriate for the somewhat noisy towing vessel. Although false positives can generally be removed through manual audit, the additional cost of processing more detections can outweigh the statistical benefit of a larger sample size. A simple 'optimal' configuration is therefore illusory unless we know what we are optimising for: is it the maximum   It is likely that a mixture of automated processing and human validation will remain the standard method for acoustic data analysis for the foreseeable future. However, improved algorithms, with a higher efficiency and/or a lower false alarm rate will reduce the amount of human effort required for data analysis. For example, Shiu et al. (2020) showed that a new algorithm for detecting the calls of North Atlantic right whales, reduces the false alarm rate to the point where a human analyst can check 1 month of continuous data in 7.44 hr. However, potential short comings of automated detections must be kept in mind, for example the potential to miss changes in vocalisations over time (Sirvic, 2015), and for poorly understood or rare species to be missed (Shiu et al., 2020).
The MHT click train detector produced a large number of false positive sperm whale click trains in the subset, but did not miss any sperm whale events, ensuring important data were not lost during the analysis. Detector settings were heavily influenced by the background noise from the towing vessel and so thresholds should be adjusted to suit the background noise within each particular study.
Validation against the fully manual analysis of the subset showed that all automatically detected delphinid click trains occurred within manually marked up delphinid events and no delphinid events were missed. These comparisons provided a high degree of confidence that few delphinid events would be missed using the MHT click train detector. This detector provides a streamlined way to detect delphinids in large datasets. Grouping individual clicks into click trains and measuring them improves classification and with continued development, there is potential to more accurately associate clicks to individuals (Macaulay, 2020). For sperm whales, reduction in detection range due to prevailing noise or propagation conditions is shown by the detection function and accounted for in density estimation. For whistles, where we are unable to use target motion analysis to estimate a detection function, it will be necessary to develop alternative methods to measure detection probability.
Although the whistle classifier has been shown to perform well, with Gillespie et al. (2013) reporting correct classification rates up to 94.5%, it is likely some classifications in this study could be incorrect. There is a paucity of specific acoustic data for some species within the survey regions; thus, the data used to train the classifier likely may not have included all species present, nor reflect likely differences in whistles between regions within the same species (Erbs et al., 2017). To increase the confidence in species classification, region specific recordings of as many cetacean species as possible are required to retrain the classifier. This will be particularly important for localised studies estimating abundance or investigating habitat preference (Erbs et al., 2017). When training data become available,  (Fais et al., 2015;Gordon et al., 2020;Lewis et al., 2018).

F I G U R E 3
The 3.3 km ESHW reported is narrower than reported by other studies which is likely due higher levels of noise emitted by the towing vessel in this study. This detection function shows that a vocally active sperm whale within 1,500 m of the track line will be detected, and so g(0) for a vocal sperm whale is equal to 1.
However, sperm whales are known to have silent periods which can be a function of social behaviour and may vary regionally (Jaquet et al., 2001;Whitehead & Weilgart, 1991) but these data are limited.
The sperm whale detection function measured during this study showed relatively low detections immediately adjacent to the track line. This is a common characteristic of sperm whale acoustic detection functions derived using data from simple two element towed arrays Gordon et al. (2020). It has been suggested that this is consequence of plotting in two dimensions, which uses bearings that have an unknown vertical component. This effect will be most evident for sperm whales vocalising at depth close to the track line. TMA calculates the distance to the animal from the hydrophone but a substantial component of this will be due to the animal's depth rather than its horizontal distance from the track line. Leaper et al. (1992) and Lewis et al. (2018) (Mcloughlin et al., 2019;Zilli et al., 2014).
The coverage provided by this initial survey is extremely broad, but sparse. The real value of initiatives like this will come once data have been collected for several years. Even so, interesting information on distributions in rarely surveyed areas is evident. Delphinids were detected frequently across every transit. Sperm whales were detected close to the shelf break and in oceanic waters in all transits except that in the Southern Ocean. Higher detection rates were evident off north-west Africa, South Africa, South America and Svalbard.
Sperm whales were also detected near seamounts such as Vema and Over the past decade, more reliable and cost-effective hardware, and sophisticated software has enabled non-specialist researchers to conduct bioacoustic surveys. In the marine environment such acoustic surveys can be conducted during opportunistic transits using a variety of survey platforms using highly automated and relatively inexpensive towed hydrophone systems. The task of detecting and classifying detections in such data so that species distributions and densities can be inferred is a time-consuming process for specialist acoustic analysts. Our study provides a template for efficient analysis of such large-scale acoustic datasets, reducing the time required by specialist analysts, and ultimately the cost of any acoustic-based study.

AUTH O R S ' CO NTR I B UTI O N S
K.F.T., D.G. and J.G. conceived the ideas and designed methodology; T.L., K.F.T. and T.R. collected the data; T. W. and D. G. analysed the data; T.W. and D.G. led the writing of the manuscript. All authors contributed to the drafts and gave final approval for publication.

ACK N OWLED G EM ENTS
We thank the crew of the Arctic Sunrise for helping to collect these data during approximately 30,000 km of transits and campaign work in four oceans. In particular, we acknowledge the teamwork involved in the deployment, checking and retrieval of the hydrophone. We also thank José Antonio Vázquez Bonales and Kike Perez Gil for data collection (Amsterdam-Senegal). Operations Department, Greenpeace International. We also thank the anonymous reviewers for their comments which helped to improve the clarity of the manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/2041-210X.13907.