A low‐cost, long‐running, open‐source stereo camera for tracking aquatic species and their behaviours

Ecologists are now widely utilising video data to quantify the behaviours and interactions of animals in the wild. This process can be facilitated by collecting videos in stereo, which can provide information about animals' positions, movements and behaviours in three‐dimensions (3D). However, there are no published designs that can collect underwater 3D stereo data at high spatial and temporal resolutions for extended periods (days). Here, we present complete hardware and software solutions for a long‐running, open‐source, underwater stereo camera rig, costing £1337. This stereo camera can continuously record aquatic species and their behaviours/interactions in high resolution (1080 p and 30 fps) and in 3D, over multiple days. We provide full design guides for the cameras and a travel‐friendly rig, and include guidance and open‐source code for calibrating the cameras in space and time. We also show how these cameras could be used to track animals' body parts and positions, and how their size, posture and behaviour can be inferred. This stereo camera will facilitate the collection of high‐resolution ecological and behavioural data, such as affiliative, agonistic or trophic interactions between species, which can inform us about the health and structure of ecosystems. These data will assist ecologists and conservationists in monitoring and understanding the impacts of current environmental pressures on ecosystem functioning.


| INTRODUC TI ON
Videos are being increasingly used to record animal behaviour and species interactions in natural environments over extended periods of time (Dell et al., 2014;Smith & Pinter-Wollman, 2021).These approaches have been facilitated by advancements in camera technology, allowing ecologists to capture the abundance, diversity and behaviour of species in remote field locations for multiple hours, if not days (Burton et al., 2015).In addition, advances in automated image-based tracking methods, will result in large numbers of videos being autonomously analysed to quantify species presence, abundance, and biomass.Moreover, tracking individuals' body parts, positions and postures (Couzin & Heins, 2022;Graving et al., 2019;Karashchuk et al., 2021) can also allow individual behavioural motifs and movements to be identified.In some cases, however, quantifying behaviour from video data requires three-dimensional (3D) positional data.These data can be captured from multiple cameras using stereophotogrammetry, allowing inferences about individuals' size, orientation and position in 3D space to be recorded (e.g.Straw et al., 2011).From these data, inferences about animals' movements (e.g.Satterfield et al., 2023;Vivancos & Closs, 2015), behaviours and interactions (e.g.Ballesta et al., 2014;Janisch et al., 2021), and physiology (e.g.Schiettekatte et al., 2022) can be inferred.Therefore, collecting 3D positional data has broad applications for the fields of animal behaviour, applied conservation and ecology.
However, stereo data collected are often limited to ~1.5 h of continuous footage owing to many cameras operating with short battery lives.To increase the likelihood, and decrease the effort of, capturing rarer interactions and full behavioural repertoires, cameras should ideally run continuously for multiple hours across multiple days.While underwater cameras with longer battery lives (e.g. 10 h plus) have been developed, these capture two-dimensional video data which are not suitable for 3D tracking, capture short image/ video sequences across predetermined timepoints (e.g. one photo every 15 min: Bilodeau et al., 2022, twice daily photos: Greene et al., 2020), or capture video at low frame rates (e.g. 10 frames per second, Mouy et al., 2020).There is need, therefore, for a system that can continuously capture multi-day, in situ, 3D stereo-footage.
By filming aquatic species in 3D, we can infer their locations, orientations, and movement trajectories in 3D space.These positional and movement data could then be used to autonomously quantify behaviours and interactions from the video data (e.g. through pose estimation, e.g.Mathis & Mathis, 2020 and/or through inferring behaviours like attraction/aggression via relative fine-scale movementtrajectories of individuals, e.g.Nathan et al., 2022).Such approaches are not yet widely utilised in field-based underwater systems but hold great potential for autonomously inferring animal behaviours and interactions from 3D aquatic stereo-footage.
Here, we present the complete hardware and software designs for an open-source underwater stereo camera, which can continuously record species and their behaviours/interactions in high resolution and in 3D over multiple days, costing £1337.We provide step-by-step details of the camera design, its open-source running software and the flat-packed, travel-friendly, rig design.We also provide a complete guide on how to deploy and calibrate cameras, and illustrate how 2D video data, collected in situ can be used for 3D pose estimation and the tracking of individuals (using open-source software).Such data can then be used to infer behavioural interactions between species.We tested the equipment by collecting video capturing fish species interactions on Galapagos reefs, deployed in remote locations for up to 2 days.By providing an in-depth guide to these long-term underwater stereo cameras, including details concerning weighting, deployment and camera focusing, this equipment will facilitate the use of this long-term underwater stereo camera for aquatic field-based research.

| Camera hardware
Each video recording device (hereafter camera) (two in total on the stereo rig) consisted of five main components: a Raspberry Pi computer, a real time clock, an external hard drive, a camera and a USB portable power bank (Figure 1a; Table 1).An optional microphone can be added to capture audio (Table 1; Figure 1a).Build guides and control codes can be found online (Dunkley et al., 2023).Each camera take ~3 h to assemble.The modular nature of this camera design (e.g. using five main individual components), means that individual parts can be replaced at a low cost (Table 1) if breakages occur.For all parts, except the microphone (which requires soldering), they function by plugging them into the Raspberry Pi computer.Following Mouy et al. (2020), we designed our stereo cameras around Raspberry Pi components.Raspberry Pi is becoming an increasingly popular tool for biological studies offering a range of applications (see Jolles, 2021).Cameras were run using a Pi Zero.
Compared to other models (e.g.4B), Pi Zero has a lower processing power meaning that it draws lower current (0.8 W at idle versus 4B ~3.4 W, Jolles, 2021;Zwetsloot, 2019), thus using less battery and extending run time.
Videos could be recorded up to 1920 × 1080 pixels at 30 frames per second (adjustable depending on requirements, see software code).While current cameras (e.g.GoPro Hero 10) can now film in 4K, 1080 p is sufficient for many behavioural observations (Figure 1c) and reduces the file sizes of videos.Recorded videos were stored to the external hard drive during filming (Table 1): a 1 TB hard drive could store ~130 h of video footage (at 1080 p and 30 fps).Contrasting Mouy et al. (2020), whose underwater camera saved videos to the Pi's MicroSD card (which is possible here for shorter deployments), the use of an external hard drive meant that file storage did not limit deployment length.
The cameras were powered using USB power banks (Table 1; Figure 1a).Here, we trialled two sizes, a standard 26,800 mAh/99 Wh and a larger (max size currently available) 30,000 mAh/150 Wh bank.Standard 26,800 mAh power banks were sufficient for day deployments (and required shorter charge times between use), while the larger banks were used for multiple day deployments.For our longest deployment, where cameras were turned on for 31.5 h (18 h of recording and 13.5 h of idle mode), the battery packs (30,000 mAh) still had ~50% of their charge (two rigs deployed, batteries: 49%-52% charge remaining).With the current design, therefore, the cameras could run for a total of ~22 h recording time (recording hours tested on land, users can set specific recording windows in the control code, see below).The capacity of power banks is likely to increase, further increasing the capabilities of the camera rig.
Power banks could be removed and replaced with charged banks via Velcro strips, although future upgrades to the design include external charging cables which reduces the need for this.By using power banks, this removed the need for specialised (and costly) chargers/housings for the multiple individual lithiumion batteries previously used (e.g.Favaro et al., 2012;Mouy et al., 2020;Purser et al., 2020).

| Camera housing
Each camera was housed in a four-inch pressure housing from Blue Robotics (Table 1, depth rated to 100 m).It is possible however to construct housings from tubing as a lower-cost alternative (e.g.PVC or acrylic as in Bergshoeff et al., 2017;Mouy et al., 2020;Purser et al., 2020).We used Blue Robotics housings so that we could fit a dome port rather than flat-port (camera mounted centrally in the dome).Dome ports are advantageous over flat ports as it does not restrict the camera's field of view and reduces image refraction, but requires further steps in camera focussing (see below) (She et al., 2019).

| Stereo camera rig
The rig uses a lightweight (~12 kg) flat design that can be quickly assembled (<1 h) and dismantled, facilitating transport between field sites/institutions (  c Blue Robotics have updated housing design since we built our cameras.These new tubes now contain a pressure release valve which reduces the risk of tube ends popping off when exposed to direct sunlight.Equivalent costs $339 US-tube, end plate and dome.New aluminium tubes have a deeper water depth rating (950 m).d Four additional 250 mm lengths can be used to create adjustable legs.
The distance and angles between the two cameras on the rig depends on what the cameras aim to capture, their field of view, and how far subjects are likely to be from the camera.More precise spatial measurements of objects can be obtained when cameras are placed further apart from each other (Boutros et al., 2015).
Therefore, to capture a scene of ~15 m max depth of field (dependent on water visibility, fish size etc), with a horizontal field of view of ~90° and a vertical field of view of ~70°, our rig has cameras fixed 700 mm apart with an inward angle of 7°.By mounting cameras on fixed plates, screwed to the frame (Figure 1b), this distance and angle was maintained each time cameras were attached.Further holes could be drilled into the frame to change the distance and angle of the cameras depending on study requirements.

| Camera software
Each camera was controlled via its Raspberry Pi.Control codes are python based (and require basic coding experience) and can be modified to fit user requirements (see Dunkley et al., 2023).Cameras were designed to automatically start, and stop recording (in continuous 30min time and date stamped sequential chunks) at specified time points (e.g.start at 6:30 am, finish at 5 pm).The camera could also be started manually via an ON/OFF switch on the external housing (at any time point), and finish recording automatically (e.g. at a specified time, e.g.

pm
).When ON and not recording, the Pi remained in idle mode, using minimal battery and would automatically restart filming at a specified time, facilitating multiple day deployments.Camera timing was controlled by the installed real time clock (Table 1) which holds time to ±2 parts per million equating to a drift of 0.17 s per day (Datasheets. Maximintegrated.Com/En/Ds/DS3231.Pdf) even when the Pi is not connected to the internet (e.g. when underwater).Any drift in time between the two cameras on a rig could be accounted for using an external flashing light visible in both cameras (see below).
Video (e.g.quality, resolution, contrast, brightness, white balance) and audio (volume/sensitivity) settings can be adjusted by the user depending on requirements.Video and audio data are recorded separately and can be merged post capture (e.g. using FFMPEG).
Camera code also includes a number of failsafes to restart the camera if an issue was detected or if the camera crashed (see Dunkley et al., 2023).

| Camera focussing
Each camera lens requires manually focussing prior to deployment (by altering the lens' distance from the camera's sensor).While focusing on land is straightforward (by focusing on the hyperfocal distance), manual focusing for underwater use requires further consideration.This is particularly important if dome ports are used on the housing, as the dome creates a 'virtual image' in front of the camera.This means that for dome ports, the focus should be set much closer underwater than in air (She et al., 2019), with the distance depending on the focal length of the camera and the diameter of the dome.There is limited information on this and often a trial-and-error approach is required (see Dunkley et al., 2023 for links to guidance).
Here, we focused the camera on an object 17 cm away from the camera sensor.This measurement was based on a dome diameter of 87 mm and a lens focal length of 3.56 mm (F2.5).Since designing the cameras, a new camera has been developed for the Raspberry Pi (Arducam 16MP Camera Module), which allows the user to set the cameras' focus in the code.This means that different focus values can be tested underwater prior to use, with the most optimal being fixed in the camera's run code.

| UNDERWATER S TEREO C AMER A DEPLOYMENT
Stereo cameras are often deployed and retrieved from the surface using ropes (e.g.Langlois et al., 2020) but there is uncertainty about what field of view the camera will be recording (as camera placement cannot be guaranteed).We recommend that for long-term deployments attempting to capture species and their behaviours, placement should be more guided (e.g.one-third benthos, two-thirds pelagic).We thus deployed our stereo camera rigs across the reefs of the Galapagos (March-May 2022, max depth 18.8 m, average ~12 m, n = 32 deployments, ~480 h 3D footage, under Galapagos National Park research permit number: PC-06-22), using SCUBA.We trialled a number of methods to deploy and retrieve cameras (e.g.lowering with rope before diver placement, lifting/moving weighted rig with lift bag) but the most efficient, controlled and minimalist method involved one or two divers descending/ascending carrying an unweighted rig.It was possible to swim with an unweighted rig to find optimal placement locations, which will depend on the aim of the study.If retrieval was not possible through diving (i.e.diving limits may be reached or if the current was strong), upon deployment a rope and buoy were attached to the rig, and this was pulled up at retrieval by hand.This rope retrieval method facilitates more opportunistic camera drops, maximises camera filming time, and minimises deployment costs (e.g.only one dive versus two), but the rope had to be carefully organised so that it did not obstruct the cameras' field of view (e.g. with changing tides), or snag on the benthos.Rope and buoys can also be disturbed by other passing boats and increase the risk of theft.Therefore, we did not always retrieve cameras via rope.
For deeper deployments (e.g.beyond SCUBA depths), or where SCUBA is not feasible, cameras could be deployed with ropes and buoys and retrieved from the surface.
Stereo rigs are often deployed with additional weights to ensure they remain at their deployed position.Camera rigs presented here weigh ~12 kg on land, and were marginally negatively buoyant in seawater, sinking slowly.To fix the rigs in place, we attached an additional 12 kg (6 × 2 kg dive weights) which was sufficient to hold the rig in position in moderate current (~2 knots).These weight quantities are not suitable for all environments (e.g.stronger currents), where additional weight would be required depending on conditions.

| Calibrating stereo cameras in time
The two cameras on the rig record video data independently, and so the offset in frames between one unsynchronised video versus the other must be calculated prior to any 3D reconstruction.We include two methods on the rigs to calculate the temporal offset between two camera pairs: a flashing light visible in both camera's field of view and microphones capturing audio (Figure 1a,b).Audio streams can be used to calculate temporal offset between two stereo cameras by cross correlating peaks in the waveforms of the two cameras' streams (Hasler et al., 2009).A flashing light, however, is preferably used to synchronise left and right camera images (Harvey et al., 2003) as audio can drift from video.We provide code for this (see Dunkley et al., 2023), which involves pinpointing at which frame (for each camera) the light starts flashing and calculating the difference between the two cameras.The light flashes every second so this calibration stage can be repeated at multiple timepoints within videos to account for any drift in the videos.We recommend that drift is calculated for every 30-min of video as a minimum.

| Calibrating stereo cameras in space
Before synchronised video data can be used for 3D  Bouguet, 2004).Videos collected through a flat (e.g.GoPro) rather than dome port are more difficult to calibrate, as the dome significantly reduces refraction.With underwater video, the effect of refraction needs to be modelled.This can be accounted for by the radial distortion component of the calibration which is often set at three terms in the polynomial on current calibration software (Shortis, 2019).However, this can be set up to six terms which can account for extreme distortion and improve calibration accuracy.
We provide an open-source script which calibrates underwater stereovideo incorporating these six terms (Dunkley et al., 2023).
To calibrate the cameras, it is first necessary to move an object, with known dimensions (e.g.2D checkerboard or 3D cuboid), in both cameras' overlapping field of view, capturing it from different angles.
The object is then identified across frames and viewpoints using a calibration software.Whilst 3D cuboid calibrations present the most accurate, precise and stable calibration method, A3 checkerboards present a suitable low-cost alternative (average proportional error: 1.7% A3 checkerboard, versus 0.5% for cuboid, versus 11.6% for A4 checkerboard, calculated in Boutros et al., 2015).With a checkerboard it is also possible to perform this calibration step in the field when deploying/retrieving cameras meaning it is more practical than methods involving large cuboids.These checkerboards can be printed and laminated, and must be attached to a rigid board to keep flat.Here, we used an A3 eight row by nine columns checkerboard (with each square length measuring 33.5 mm, Figure 1d).It is important to ensure that the calibration checkerboard is observed at different orientations and at multiple positions within the video frames (Boutros et al., 2015).This calibration step can be completed in ~10 min.It is necessary to repeat this calibration step on every deployment as even slight deviations in camera positioning within the tube (e.g. when cameras are removed from tubes for battery charging, data download) alters calibration and measurement error.
Our calibration method produced an average error of 1.  1e).This can be achieved using the triangulation method we present (Dunkley et al., 2023) or a programme like SEBASTES (Williams et al., 2016).

| APPLI C ATI ON S
This stereo camera rig is designed for long deployments in remote aquatic environments, facilitating fine detailed ecological and behavioural data collection.Our underwater stereo camera can record over periods of days at high temporal (30 fps) and spatial (1080 p) resolutions.Our camera offers researchers the opportunity to not only capture species abundance data, but to use the 3D information that can be determined from the cameras to inform measurements about the size (more accurate if calibrated with a 3D rather than 2D object, Boutros et al., 2015) and behaviours of organisms.Such 3D field-based video data allows quantification of animals' orientations and postures.This information can ultimately be used to quantify social behaviour, feeding or aggressive interactions between species.Although these approaches have been used to autonomously map behaviours and interactions in laboratory conditions (e.g.Bala et al., 2020;Günel et al., 2019), our combination of hardware and software allows these approaches to be used to capture natural behaviours over periods of days in the field.Such data may be particularly important to capture temporal effects of experimental manipulations or behaviours that are rarely observed.

| CON CLUS IONS
Our cameras, rig and accompanying post-processing code, provides an open source, travel friendly and accessible means to capture and analyse continuous 3D underwater video footage over a period of days.This set-up avoids the trade-offs associated with gathering high sampling frequency versus high resolution versus long-term data, which means that species presence (including rare species), and their behaviours and interactions with the environment and one another can be captured in finer, more complete detail.Such data can be used to answer a range of ecological questions, relating to how the actions and decisions of species shape ecosystem function.With advances in artificial intelligence methods (e.g.pose estimation, see Mathis & Mathis, 2020), autonomous analysis of large 3D video data sets will be facilitated.
These camera rigs will allow a more complete picture of the health and function of underwater ecosystems.

F
I G U R E 1 (a) Set-up of an individual camera system showing components.(b) Stereo camera rig in situ.(c) Screenshots of a cleaning interaction filmed at 1080 p and 30 frames per second in Galapagos using the stereo camera rig presented here.Points show 2D tracks for turtle (purple), king angelfish (green) and razor surgeonfish (orange) across 465 frames.(d) Stills from left and right cameras from stereo camera rig showing A3 checkerboard calibration which calibrates cameras in space.(e) 3D tracks of individuals from (c) reconstructed using camera calibration parameters.
Details and costs of components used to build one stereo camera rig as of December 2022.
Dunkley et al., 2023eeDunkley et al., 2023for design files).Full rigs take ~20 h to prepare prior to first use (without outsourcing the hole drilling or 3D printing).This timing is dictated by the speed of the 3D printing (hole drilling takes ~3 h).The rig has been designed with a low centre of gravity (camera height 500 mm from substrate) so it can withstand strong currents and swell.The design also includes adjustable legs, which facilitates adjustment of the camera's position, angle and field of view.The rig can also be adjusted to incorporate a bait pole if a baited remote stereo camera set up is required.TA B L E 1