Identification of fish sounds in the wild using a set of portable audio‐video arrays

Associating fish sounds to specific species and behaviours is important for making passive acoustics a viable tool for monitoring fish. While recording fish sounds in tanks can sometimes be performed, many fish do not produce sounds in captivity. Consequently, there is a need to identify fish sounds in situ and characterise these sounds under a wide variety of behaviours and habitats. We designed three portable audio‐video platforms capable of identifying species‐specific fish sounds in the wild: a large array, a mini array and a mobile array. The large and mini arrays are static autonomous platforms than can be deployed on the seafloor and record audio and video for one to two weeks. They use multichannel acoustic recorders and low‐cost video cameras mounted on PVC frames. The mobile array also uses a multichannel acoustic recorder, but mounted on a remotely operated vehicle with built‐in video, which allows remote control and real‐time positioning in response to observed fish presence. For all arrays, fish sounds were localised in three dimensions and matched to the fish positions in the video data. We deployed these three platforms at four locations off British Columbia, Canada. The large array provided the best localisation accuracy and, with its larger footprint, was well suited to habitats with a flat seafloor. The mini and mobile arrays had lower localisation accuracy but were easier to deploy, and well suited to rough/uneven seafloors. Using these arrays, we identified, for the first time, sounds from quillback rockfish Sebastes maliger, copper rockfish Sebastes caurinus and lingcod Ophiodon elongatus. In addition to measuring temporal and spectral characteristics of sounds for each species, we estimated mean source levels for lingcod and quillback rockfish sounds (115.4 and 113.5 dB re 1 μPa, respectively) and maximum detection ranges at two sites (between 10.5 and 33 m). All proposed array designs successfully identified fish sounds in the wild and were adapted to various budget, logistical and habitat constraints. We include here building instructions and processing scripts to help users replicate this methodology, identify more fish sounds around the world and make passive acoustics a more viable way to monitor fish.

It is likely that many more species produce sounds, but their repertoires have not yet been identified (Looby et al., 2022). Fish can produce sound incidentally while feeding or swimming (e.g. Amorim et al., 2004;Moulton, 1960) or intentionally for communication (Bass & Ladich, 2008;Ladich & Myrberg, 2006). For example, fish sound spectral and temporal characteristics can convey information about male status and spawning readiness to females (Montie et al., 2016), or male body condition (Amorim et al., 2015). It has been speculated that some species of fish may also emit sound to orient themselves in the environment (i.e. by echolocation, Tavolga, 1977). As is the case for marine mammal vocalisations, fish sounds can typically be associated with a specific species and sometimes to specific behaviours (Ladich & Myrberg, 2006;Lobel, 1992). Recently, Parmentier et al. (2021) used fish sounds to identify a new cryptic species of humbug damselfish in French Polynesia. Several populations of the same species can also have different acoustic dialects (Parmentier et al., 2005). Consequently, researchers can measure the temporal and spectral characteristics of recorded fish sounds to identify which species of fish are present in a particular environment, to infer their behaviour and, in some cases, to potentially identify and track a specific population (Luczkovich et al., 2008).
Using passive acoustics to monitor fish can complement existing monitoring techniques such as net sampling (Portt et al., 2006), active acoustics (Godøl et al., 2014) or acoustic tagging (Pittman et al., 2014). Passive acoustics presents several advantages: It is nonintrusive, can monitor continuously for long periods of time and can cover large geographical areas. However, to use passive acoustics to monitor fish, their sounds must first be characterised and catalogued under controlled conditions. This can be achieved in various ways. The most common way to identify species and behaviourspecific sounds is to capture and isolate a single fish or several fish of the same species in a controlled environment (typically a fish tank) and record the sounds they produce (e.g. Riera et al., 2018Riera et al., , 2020. Such an experimental setup precludes sound contamination from other species and allows visual observation of the behaviour of the animal. While these studies provide important findings on fish sound production, they do not always result in sounds that fish produce in their natural environments. To partially address this issue, other studies record fish in natural environments but constrained in fishing net pens to ensure they remain in sufficient proximity of the hydrophones (e.g. Cott et al., 2014). This also presents some challenges as other fish species outside the pen can potentially be recorded.
Passively recording fish in their natural environment has many advantages, especially in terms of not disrupting the animals.
However, it provides less control over external variables and also presents many technical challenges. Remotely operated vehicles (ROVs) equipped with video cameras and hydrophones have been used by Sprague and Luczkovich (2004) and Rountree and Juanes (2010). Locascio and Burton (2015) deployed fixed autonomous passive acoustic recorders and conducted diver-based visual surveys to document the presence of fish species. They also developed customised underwater audio and video systems to verify sources of fish sounds and to understand their behavioural contexts. Most of these monitoring techniques are limited by high power consumption and data storage space requirements, and are typically only deployed for short periods of time. Cabled ocean observatories equipped with hydrophones and video cameras provide valuable data for more extended time periods but by their nature are constrained to fixed locations and are expensive to deploy and maintain (Sirovic et al., 2012;Wall et al., 2014). Rountree et al. (2006) noted the need for the research community to develop longer term and affordable autonomous video and audio recorders that are more versatile than the current technology and facilitate cataloguing fish sounds in situ.
A key consideration when cataloguing fish sounds in the wild is the ability to localise the sounds recorded. In most cases, having only a single omnidirectional hydrophone and a video camera is not sufficient. Several fish can produce sounds at the same time and it is important to know which fish in the video recording produced the sound. Although numerous methods have been developed for the large-scale localisation of marine mammals based on their vocalisations (see reviews in Adam & Samaran, 2013;Zimmer, 2011), only a handful of studies have been published to date on the localisation of fish sounds. Wilson et al. (2019), D'Spain andBatchelor (2006), Mann and Jarvis (2004) and Spiesberger and Fristrup (1990) localised distant groups of fish. Putland et al. (2018) localised individual oyster toadfish in two dimensions (2D) using a 20 m long linear array fixed to a dock. Parsons et al. (2009Parsons et al. ( , 2010 and Locascio and Mann (2011) conducted finer scale three-dimensional (3D) localisation and monitored individual fish in aggregations. Gervaise et al. (2019), Ferguson andCleary (2001) and Too et al. (2019) also performed fine-scale acoustic localisation on sounds produced by invertebrates. Finescale localisation is extremely valuable as it can not only be used methodology, identify more fish sounds around the world and make passive acoustics a more viable way to monitor fish.

K E Y W O R D S
acoustic localisation, copper rockfish, fisheries, lingcod, passive acoustic monitoring, quillback rockfish, video cameras with video recordings to identify the species and behaviour of the animals producing sounds, but can also be used to track movements of individual fish, estimate the number of vocalising individuals near the recorder, and measure source levels of the sounds. The latter represents critical information needed to estimate the distance over which fish sounds can propagate before being masked by ambient noise (Locascio & Mann, 2011;Radford et al., 2015).
Once fish sounds are catalogued, passive acoustics alone (without video recordings) can be used for monitoring the presence of fish in space and time. Many of the soniferous fish species are of commercial interest which makes passive acoustic monitoring a powerful and non-intrusive tool that could be used for conservation and management purposes (Davis et al., 2017;Gannon, 2008;Luczkovich et al., 2008;Rountree et al., 2006;Van Parijs et al., 2009). Sounds produced while fish are spawning have been used to document spatiotemporal distributions of mating fish (Bolgan et al., 2017;Lowerre-Barbieri et al., 2011;Luczkovich et al., 2008;Parsons et al., 2016Parsons et al., , 2017Sánchez-Gendriz & Padovese, 2017 (2017) demonstrated how passive acoustics could be used to detect an invasive fish species in a large river system.
As described here, passive acoustics could be a very powerful tool to monitor fish populations and behaviour. However, its capabilities are currently limited significantly by the fact that many fish sounds have not yet been linked to specific species. This is, in part, because there is no readily available instrumentation capable of easily identifying sounds that fish produce in their natural habitat.
Here, we propose three audio-video array designs (with their associated analysis software) that address this important research need by localising fish sounds in 3D and matching localised sounds to individual fish captured by video cameras. These systems are portable, adapted to a variety of coastal habitats, and straightforward to build and replicate. By making such hardware and software accessible, we provide the necessary tools that will help expand the worldwide fish sound catalogue and therefore make passive acoustics a more viable tool to monitor fish populations.

| Description of the audio-video arrays
Three audio-video arrays, referred to as the large, mini and mobile arrays, were developed to acoustically localise and visually identify fish producing sounds. Each array was designed with different constraints in mind. The large array was configured for the most-accurate 3D acoustic localisation, the mini array for easier deployments in constrained locations or on rough/uneven seafloors and the mobile array for dynamic real-time spatial sampling over shorter time periods (hours rather than days or weeks). We provide detailed building instructions and deployment procedures of these three audio-video arrays in the Supporting Information (Supporting_Information.pdf).

| Large array
The large array is a static platform deployed on the seafloor that records audio and video data for one to 2 weeks ( Figure 1). It uses six M36-V35-100 omnidirectional hydrophones (GeoSpectrum Technologies Inc.) connected to an AMAR-G3R4.3 acoustic recorder (JASCO Applied Sciences) with a PVC housing rated to 250 m depth. Four of the hydrophones (1-4 in Figure 1) are connected to the first acquisition board of the recorder via a 4 m long 4-to-1 splitter cable. The two other hydrophones (5, 6 in Figure 1) are connected, via a 3 m long 2-to-1 splitter cable, to the second acquisition board of the recorder. The recorder is set to acquire acoustic data continuously as 30-min wave files, at a sampling frequency of 32 kHz, with a bit depth of 24 bits, and with a predigitalisation analog gain of 6 dB. An external battery pack (BP) with 48 D-cell batteries is used to power the recorder, which, F I G U R E 1 Large audio-video array. (a) Photograph of the large array deployed in the field. (b) Side view and (c) top view diagrams of the array with dimensions. The six hydrophones are represented by the numbered grey circles. The top and side video cameras are indicated by C1 and C2, respectively. Note that C1 is not represented in (c) for clarity. The acoustic recorder and its battery pack are indicated by AR and BP, respectively. Grey and red lines represent the PVC structure of the array (red indicating the square base of the array). using this configuration, allows the system to acquire data for up to 35 days. An end-to-end calibration was performed for each hydrophone using a piston-phone type 42AA precision sound source (G.R.A.S. Sound & Vibration A/S) at 250 Hz. System gain measured on all hydrophones was − 167.3 ± 0.2 dB re FS/μPa, where FS is the full digitalisation scale (i.e., amplitude values between −1 and 1). Positions of the hydrophones within the large array ( Figure 1) were defined to maximise the accuracy of the acoustic localisation (see optimisation procedure described in Section 2.3 and in the Supporting Information). Two low-cost autonomous FishCam cameras (Mouy et al., 2020) are used to record video inside the large array. One (C1) is located at the top of the array and is oriented downward towards the seafloor, and the other (C2) is located on the side of the array, pointing horizontally towards hydrophone 4 ( Figure 1). Each camera is set to record video continuously during the day (from 5:00 to 21:00 local time) and to shutdown during the night. Video data are recorded on 300-s h264 files, with a frame rate of 10 frames per second, a resolution of 1600 × 1200 pixels, and an ISO of 400. Both cameras emit different sequences of beeps at 3 kHz every 4 h for time-synchronising the video and acoustic data. The autonomy of the FishCams is storage-limited, dependent on the underwater light conditions, and typically ranges from 8 to 14 days (see Mouy et al., 2020). All instruments are secured to a tent-frame shaped PVC frame of 2 m width, 2 m length and 3 m height ( Figure 1). All structure elements (PVC tubes) are perforated to avoid having air pockets that could reflect sounds and impact the localisation accuracy.

| Mini array
Like the large array, the mini array is a static platform deployed on the seafloor. It can record audio and video data for approximately 1 week and has a much smaller footprint than the large array.
The mini array uses four HTI-96-MIN omnidirectional hydrophones (High Tech Inc.) connected to a SoundTrap ST4300HF acoustic recorder (Ocean Instruments). The recorder is set to acquire temperature every 10 s, and acoustic data continuously on 15-min wave files, at a sampling frequency of 48 kHz, and with a bit depth of 16 bits. Using this configuration, the recorder has an autonomy of approximately 7 days. An end-to-end calibration was performed for each hydrophone using a piston-phone type 42AA precision   Acoustic localisation was tested in the field for all three arrays using controlled sound sources. A detailed description of each processing step and the results from the localisation of controlled sources can be found in the Supporting Information (Supporting_Information. pdf).

| Optimisation of hydrophone placement
For the large array, the placement of the hydrophones was defined so as to minimise the overall localisation uncertainty. This was achieved using the simulated annealing optimisation algorithm (Kirkpatrick et al., 1983) and followed the procedure developed in Dosso and Sotirin (1999). The optimisation consisted of finding the x, y and z coordinates of the six hydrophones (18 parameters) that minimises the average localisation uncertainty of 600 simulated sound sources placed on a 2 m radius sphere around the centre of the array. For the same array footprint (i.e., 2 m × 2 m, Figure 5a,b), the spatial capacity of the large array to localise with an uncertainty below 50 cm is more than seven times larger than the hydrophone array used in Mouy et al. (2018) (i.e., 33.5 m 3 vs 4.2 m 3 , Figure 5).
This means that fish sounds can be localised accurately even when the fish are located up to a meter outside the array. The simulated annealing approach was not used for the mini and mobile arrays as the placement of the hydrophones was mostly dictated by the mechanical constraints of these platforms. Further details on the simulated annealing process can be found in the Supporting Information (Supporting_Information.pdf).

| Localisation capabilities
Because of their different hydrophone apertures, the three audio-video arrays do not have the same localisation capabilities. Errors in the TDOA measurements have greater impact on the localisations performed with the mini and mobile arrays than with the large array. Figure 6 depicts the estimated 50-cm localisation uncertainty isolines for the three arrays using the same TDOA errors (i.e. 0.12 ms) and shows that both the mini

| Characterisation of identified fish sounds
Identified fish sounds are characterised by measuring their pulse frequency, pulse repetition rate and duration, where a pulse is defined as a positive/negative amplitude pair. Each fish sound is typically made up of one or more pulses. All measurements are performed on the waveform as in Casaretto et al., 2015 (Figure 7). The pulse duration, T pulse , is measured as the time separating the two first consecutive amplitude peaks of a pulse, and is used to calculate the pulse frequency in hertz (i.e. 1 ∕ T pulse ). The pulse repetition interval, T rep (also referred to as pulse interval in Casaretto et al., 2015), is measured as the duration between the first peak of two consecutive pulses and is used to calculate the pulse repetition rate in pulses per second (i.e. 1 ∕ T rep ). The duration, T dur , is the duration of the fish sound in seconds. There is only one duration measurement per fish sound. However, fish sounds with multiple pulses (typically grunts) have several measurements of pulse frequency and pulse repetition rate (e.g. Figure 7).

| Estimation of source levels
The acoustic source levels are calculated for the localised fish sounds by applying estimated propagation loss values to the received levels using (Urick, 1983), where SL is source level, RL is received level and PL is the propagation loss (all in dB re 1 Pa). Received levels are calculated for each fish (1) SL = RL + PL, F I G U R E 4 Overview of the processing workflow.
sound after converting the amplitude of the digitised signal, x(t), into acoustic pressure values using where S g is the system gain, in dB re FS/μPa, measured in the calibrations described in Section 2.1. The root-mean-square (RMS) received sound pressure level is defined (in dB re 1 μPa) as Source levels are calculated by assuming spherical spreading of the acoustic wave. Additionally, given the short distance between the hydrophones and the fish and the low frequency of fish sounds, absorption losses are considered negligible. Therefore, the propagation loss in Equation 1 is defined by where R is the distance in meters between the source (i.e. the localised fish) and the receiver (hydrophone). Source levels are estimated using data from the hydrophone closest to the fish location and by band-pass filtering the acoustic recording in the frequency band of the fish sound (fourth-order Butterworth filter).

| Estimation of detection ranges
Estimating detection range is key in designing passive acoustic monitoring programs as it helps to (1) define the distance over which fish sounds can be detected, (2) determine how many recorders are required for the area of interest and (3) assess if passive acoustic monitoring is suitable for an area, given its ambient noise conditions. If we assume that fish sounds with a received level below the ambient noise are not detectable (i.e. detection threshold of 0 dB), and that sound waves spread spherically without absorption, then the maximum distance R max at which fish sounds can be detected is estimated as (4) PL = 20 log 10 (R), F I G U R E 5 Comparison of localisation uncertainties between the large array from this study and the array used by Mouy et al. (2018). where SL is the source level, and NL is the noise level at the monitoring location.

| Software implementation
All the detection, localisation and optimisation algorithms described in this paper were implemented in Python 3.8 using the library ecosound (Mouy, 2021), which relies on pandas (McKinney & Team, 2015), NumPy (Harris et al., 2020), scikits-learn (Pedregosa et al., 2011), Dask (Dask Development Team, 2016) and xarray (Hoyer & Hamman, 2017). Jupyter Notebooks allowing the reproduction of the results from this study are on the GitHub repository of this paper (see Data Availability Statement section).

| Data collection in the field
The large, mini and mobile arrays were deployed at five sites off the east coast of Vancouver Island, British Columbia, Canada (Figure 8, Table 1). These sites were selected to cover a variety of habitats and fish species. Ogden Point is a well known SCUBA diving and shore F I G U R E 7 Waveform of a fish grunt composed of six pulses. The pulse duration, T pulse , and pulse repetition interval, T rep , are measured on the waveform representation of the fish sound to calculate the pulse frequency and pulse repetition rate, respectively. T dur represents the duration of the sound. For all three arrays, data were downloaded after recovery of the instruments and processed in post analysis. Fieldwork operations did not require any licences or permits.

| RE SULTS
This section shows examples of fish sounds that were identified in the field using each platform. Videos corresponding to Figures 9-14 can be found in the Supporting Information.  sounds were localised on the seafloor near the centre of the array (i.e. below hydrophone 4), with a localisation uncertainty less than 20 cm for all dimensions and corresponded to the location of the lingcod (Figure 9b,c). We conclude that fish sounds recorded were therefore emitted by the lingcod. located. Localisation uncertainties were less than 20 cm inside the array and less than 40 cm outside the array, which leaves no ambiguity that the fish sounds were produced by the quillback rockfish.  Figure 12 shows the acoustic localisation of five impulsive fish sounds while two copper rockfish were located in front of the mobile array. All sounds were localised at the front right of the array, near the seafloor (Figure 12b,c). Localisation uncertainties were less than 15 cm along the x axis, and 10 cm along the z axis. Despite the greater localisation uncertainties in range (i.e., > 30 cm along the y axis), the absence of other fish in the video within the boundaries of the localisation uncertainties confirms that the fish sounds in Figure 12a were produced by the copper rockfish in front of the mobile array. These impulsive sounds seemed to be associated with an agonistic behaviour. Figure 13 shows the acoustic localisation of six impulsive fish sounds while a copper rockfish and a blackeye goby Rhinogobiops nicholsii were located in front of the mobile array. All sounds were localised at the front of the array, near the seafloor, and had localisation uncertainties less than 5 cm and 10 cm along the x and z axes, respectively, and up to 20 cm along the y axis (Figure 12b,c). Given the proximity of the two fish and the larger localisation uncertainties in range, it is not possible to identify with certainty which fish produced the sounds.   Table 3 shows the estimated detection ranges at the Mill Bay and

| Estimated detection ranges
Hornby Island locations where the large audio-video array was deployed ( Figure 8, Table 1). The calculation was performed using a source level value of 113 dB re 1 μPa (as measured in Table 2) and noise levels measured at the middle hydrophone of the large array between 20 and 1000 Hz (i.e. the frequency band of fish sounds) using the software PAMGuide (Merchant et al., 2015). Given that noise levels constantly fluctuate in time, the detection range was calculated for the minimum (L min ) and maximum (L max ) noise levels, as well as for the 5th, 50th and 95th percentile levels (L 5 , L 50 and L 95 , respectively). Detection range values in Table 3 show that at Hornby

| DISCUSS ION
Our results show that all three audio-video arrays can successfully identify fish sounds in the wild. Field tests using a controlled sound source for each platform ground-truthed the accuracy of the localisation results and confirmed that the instrumentation and analysis process are working correctly (see Supporting Information).
The large array provides the most accurate acoustic localisations ( Figure 6) and, with its two video cameras, has the largest field of view. Hydrophone placement for this array was optimised using simulated annealing to minimise localisation uncertainties. This optimisation resulted in a hydrophone geometry that was different than the one used by Mouy et al. (2018) to localise fish sounds off Cape Cod.
Using this hydrophone configuration increased by a factor of about seven the spatial localisation capacity of the array over that used by Mouy et al. (2018) (for the same array volume) and allowed fish to be localised accurately (i.e., localisation uncertainty < 50 cm) at up to 3 m from the centre of the array ( Figure 6). The mini and mobile arrays have much smaller footprints and most of the sound sources to localise are outside the array. Consequently, small errors in the measurement of the time-difference of arrival lead to large errors in the localisation results. Nevertheless, these two arrays are capable of determining the bearing and elevation of the sound source, which in many cases, is enough to confirm that the sounds recorded are emitted by the fish in front of the camera (e.g. Figure 12). In some circumstances, when several fish are located along the same bearing angle from the mini or mobile arrays, the larger localisation uncertainties in range do not allow definitive identification of the fish producing the sound (e.g., Figure 13). This is typically not an issue if the fish are from the same species.
Although attempting to identify fish sounds in the wild using a single hydrophone and a single camera is relatively inexpensive and logistically easy, our study shows the importance of having several hydrophones (or directional sensors) for performing acoustic localisation. Inferring which individual fish produces the sounds, based only on visual observations from video footage, is prone to errors and can lead to assigning sounds to the wrong fish species. The case presented in Figure 10   arrays are less affected by currents. From a logistics perspective, the large array is the most complex to deploy because of its size. The mini array is smaller and therefore much easier to deploy. The mobile array is the easiest platform to deploy as it only requires a single person piloting the underwater ROV. Cost wise, the large array uses six hydrophones and a high-quality multichannel acoustic recorder, which makes it the most expensive platform (∼ ϖ40,000 USD). The mini and mobile arrays are less costly (∼ ϖ8,000 and ϖ11,000 USD, respectively). In terms of the sampling, both the large and the mini arrays are static platforms that are deployed over several weeks at a time. This long-term duration allows non-intrusive observation and measurement of fish sounds related to a variety of behaviours.
The short home-range of some fish species (Tolimieri et al., 2009), and the static nature of the large and mini arrays, mean these platforms may only sample sounds from a small set of individuals. If this is an issue, carrying out short deployments at different locations may be preferable to performing a series of longer deployments at the same location. The mobile array can sample several individuals over a larger spatial area but can only record for a few hours. The mobile array is more intrusive than the two other arrays, which tends to more often elicit aggressive behaviours.
Consequently, the mobile array may sample a more restricted set of acoustic behaviours. As demonstrated in Section 3, all arrays can successfully attribute sounds to individual fish and therefore measure the temporal and frequency characteristics of sound emitted by specific fish species. The large array provides accurate localisation over greater distances (Figure 6), captures a large field of view, and is consequently the preferred platform for estimating source levels. Source levels can also be estimated using the mini and mobile arrays when the fish is close to or inside the array (e.g. Figure 11). However, the larger localisation uncertainty in range for fish located farther away (e.g. Figure 14) or on the side of the array (see section 4.3 in the Supporting Information document) may not allow source levels to be estimated accurately. Figure 15 illustrates the constraints, strengths and weaknesses of each audio-video array. When used in unison, these platforms can cover many different habitats, species, and logistical constraints. is also a growing interest in exploring sounds produced by fish in deeper habitats (Bolgan & Parmentier, 2020;Mann & Jarvis, 2004).
Due to the low light conditions in the deep ocean, the design of our arrays would need to be modified to sample in such environments. External LED lights could be added to the FishCam on the array and could be controlled via its onboard single board computer (Mouy et al., 2020). Alternatively, low-light cameras could be used (Pagniello et al., 2021). There is also a strong interest in using passive acoustics for monitoring fish in tropical waters (e.g., coral reefs) where the density of fish and fish sounds is typically much greater than in British Columbia (Looby et al., 2022). In these environments, our audio-video arrays may not be as effective at associating specific sounds to individual fish (due to their close proximity), but they should allow the association of sounds to species. Audio-video arrays with more hydrophones would be recommended in such environments to maximise localisation accuracy.
These arrays could also be further developed by improving the data processing workflow. Currently, the audio and video data are only weakly linked. This could be improved by projecting the bearings and associated uncertainties of the localised sounds onto the video data, which would further help in associating sounds to individual fish. In very shallow environments, fish sounds could be received with multi-path reflections from the surface and bottom boundaries. In such case, the measurement of the TDOAs using the full waveform (as we do here) may become inaccurate and degrade the accuracy of the localisation. Consequently, in these environments, it would be preferable to measure TDOAs using just the first amplitude peak of the waveform which is more likely to capture the direct path of the sound.
Lingcod have not previously been documented to produce sound.
Kelp greenling Hexagrammos decagrammus, which belong to the same family as lingcod Hexagrammidae and also lack a swim bladder, have been reported to have muscles possibly responsible for sound production (Hallacher, 1974), but their sounds have not been recorded. A number of rockfish species have been reported to have sonic muscles (Hallacher, 1974) and some have been documented to produce sounds (Nichols, 2005; Note that the sound measurements presented in this study aim at illustrating the type of information that can be measured with the audio-video arrays. A comprehensive description of the variability of the sounds from each species based on the analysis of the entire dataset we collected will be the object of a future study.
In this paper, we proposed three audio-video array designs and demonstrated that they can be used to successfully identify fish sounds in the wild in a variety of coastal habitats. We also provided detailed building instructions and processing scripts that allow this work to be easily replicated. Our contribution fills a current research gap and will help expand the worldwide fish sound catalogue and therefore make passive acoustics a more viable tool to monitor fish populations.

AUTH O R CO NTR I B UTI O N S
Xavier Mouy conceived the ideas, designed the methodology, con-

CO N FLI C T O F I NTE R E S T S TATE M E NT
There is no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data and processing scripts used in this paper are available on these online repositories: • GitHub repository: https://github.com/xavie rmouy/ XAV-arrays.
Contains python scripts and Jupyter notebooks for reproducing the data processing performed in this study. Also contains examples of field logs and checklists used in the field. A snapshot of the repository is also archived on Zenodo (Mouy, 2023). Licence: BSD-3-Clause.
• OSF data repository: https://osf.io/q8dz4. Contains the acoustic data, metadata, and configuration files needed to reproduce the detection and localisation results in