A prototype autonomous robot for underwater crime scene investigation and emergency response

Underwater crime scene investigation and emergency response are tasks typically carried out by divers constituting part of a specialist team. Operating in such dynamic environments, often with poor visibility and risk of concealed hazards, can be time consuming and dangerous. Autonomous uncrewed vessels with underwater acoustic imaging sensors have been used for similar purposes in other fields (e.g., hydrography, naval mine countermeasures [MCMS], etc.) but have not been adopted in this specific application domain. The Police Robot for Inspection and Mapping of underwater Evidence (PRIME) is an autonomous uncrewed surface vessel that is being developed for this purpose. It is a novel application of existing robotic technology that is intended to be used within an end‐to‐end police and emergency underwater search process. It aims to enhance the effectiveness, efficiency, and safety of divers by autonomously locating and highlighting target objects or regions of interest, as well as benign regions, thereby reducing their time spent underwater. Side‐scan imaging sonars are used to sense the underwater environment using techniques leveraged from the similar application domain of naval MCMs. The system autonomously generates actionable intelligence in the form of simplified coverage and anomaly maps for easy interpretation by the dive team. These are communicated to shore in real‐time and georeferenced on satellite maps. This paper details the PRIME system prototype and presents results from initial field experimentation. The prototype has been operated in various urban, shallow‐water environments. The experimental results shown here were collected in Bristol Harbour (the UK) with a water depth of approximately 5 m. In the experiment, a clothed mannequin resembling a human body was deployed on the muddy floor. Autonomous searches were executed and the body was detected successfully as an anomaly against the background, illustrating the feasibility and viability of the system as an autonomous robotic aid for locating missing persons in a representative, unstructured, and dynamic real‐world environment.


| Underwater search and recovery
Police and emergency underwater search teams are generally comprised of highly skilled, specialist divers (Becker et al., 2013). In the UK, dive teams exist as part of civilian police forces, such as the Metropolitan Police Service or North West Police Underwater Search & Marine Unit, as well as private groups, such as Specialist Group International. Mission types vary, but typically involve: body recovery (e.g., in missing persons cases involving accidental or intentional death); finding and retrieving items of evidence (e.g., weapons or narcotics thrown into a body of water in an effort to be concealed or destroyed); and mitigating dangerous objects, such as improvised explosive devices (IEDs). Some photographs from example cases are shown in Figure 1. The duration of a mission can vary from hours to weeks depending on the mission parameters and condition of the underwater environment. The environment is typically an inland waterway or lake with low to zero visibility and may be cluttered with obstructions or hazardous objects. Dive teams consist of specially trained members with a significant responsibility for the timely discovery, proper documentation, and recovery of any items considered to have evidentiary value (Erskine & Armstrong, 2021;Kelly, 2010;Wylie, 2019). Therefore, they are under very high demand due to their limited human and time resources. Searching for missing persons for example requires a considerable amount of human resources, with a single dive team consisting of a minimum of 3-5 personnel.
Requirements vary depending on each case, and can sometimes require multiple teams, boats with associated crew, and cadaver dogs (Becker et al., 2013;Ruffell, 2014;Schultz et al., 2013).

| Current approaches and problems
Modern robotic and acoustic imaging technologies exist that can aid dive teams in underwater search operations, such as remotely operated vehicles (ROVs) scanning sonar devices, subbottom profilers, and water-penetrating radar (Becker et al., 2013;Decker, 2007;Erskine & Armstrong, 2021;Parker et al., 2010;Ruffell, 2014;Schultz et al., 2013). Towed side-scan sonar has been used to aid searches in missing persons cases (Erskine & Armstrong, 2021;Ruffell, 2014;Schultz et al., 2013). However, the use of such equipment can be troublesome. First, tethered devices such as ROVs and towed sonar are challenging to operate due to the risk of entanglement in cluttered environments or with underwater foliage (Schultz et al., 2013), snagging when close to shore (Ruffell, 2014), as well as the risk of damage to equipment or loss. Second, such devices require significant operator training and experience for optimal operation, as well as interpretation of the gathered data, which can accumulate rapidly (Decker, 2007;Erskine & Armstrong, 2021;Schultz et al., 2013). In practice, this results in officers reverting to more familiar and trusted manual techniques involving tight, rasterstyle search patterns such as those illustrated in Figure 2, which are very time consuming. These require that divers are tethered for systematic searching, as well as to ensure safety, with dives typically limited to durations of 15-20 min shifts (Becker et al., 2013;Erskine & Armstrong, 2021;Ruffell, 2014). When employed appropriately, side-scan sonar is a highly effective tool for searching and locating of missing persons (Schultz et al., 2013). A requirement therefore exists for an autonomous system with the ability to rapidly survey underwater regions with minimal operator oversight, whilst automatically providing a simple interpretation of the collected data, thereby improving performance and safety.

| Current research
Research on autonomous human body detection using sonar is limited, but similarities can be drawn with naval mine countermeasures (MCMs) operations. Target features in sonar images can be enhanced with image processing techniques, such as wavelet filtering F I G U R E 1 Examples of police underwater search operations and outcomes: (a) New York City Police Department divers searching Harlem Meer following a murder in the area (McGrady, 2019); (b) officer from the North West Police Underwater Search & Marine Unit, the UK, shown retrieving a discarded firearm (Hemans, 2012). (Hunter & van Vossen, 2014), whilst seafloor characterization and complexity mapping are used to evaluate performance of automatic target recognition (ATR) algorithms (Fakiris et al., 2013;Geilhufe & Midtgaard, 2014;Williams, 2015). Effective scanning of a given area can be enhanced by using appropriate search patterns (Hunter et al., 2018), and scanning multiple times from different orientations can improve the detection and classification performance of ATR algorithms (Fawcett et al., 2010;Zerr et al., 1997). Although objects being searched for in MCM operations are typically sound-hard scatterers as opposed to a relatively soft human body, human bone has a similar reflective coefficient to sandstone and concrete (Erskine & Armstrong, 2021), allowing for detection using sonar-based approaches. Autonomous detection of a mannequin, used as a proxy for a human target, has been demonstrated using a convolutional neural network (CNN) trained with multibeam sonar images (Nguyen et al., 2019) but these were gathered manually and processed offline.
Commercial and research platforms employing low cost, commercially available sonar devices have been used to accurately survey shallow-water marine environments using towed or hull-mounted side-scan or multibeam sonar (Kaeser et al., 2013;Kebkal et al., 2014).
Uncrewed surface vessels (USVs) for autonomous long-term or large-scale exploration operations have been developed based on vessels ranging from basic catamaran assemblies (Girdhar et al., 2011) to the instrumentation of commercially available kayaks (Moulton et al., 2018).

| System requirements and proposed solution
There are both cost and operational barriers that impede the uptake of modern underwater acoustic imaging technologies. This has motivated the desire for a low-cost, untethered, and integrated system for aiding police divers during underwater crime scene and accident investigations that can operate autonomously and requires minimal operator training and intervention. It is further motivated by the potential to increase diver safety as a byproduct of reducing their time in the water and by giving them prior intelligence on underwater hazards. The University of Bath has been working together with representatives within the UK police and emergency services to develop and deliver such a system.
The system requirements and use cases were determined following consultation with stakeholders. The key requirements are: 1. Effective and safe-The system should enhance the search capabilities of the divers without causing interference or harm.
2. Simple-It should gather and interpret data autonomously before presenting meaningful and easily understandable information to the operator without the need for specialist training.
3. Compact-It should fit easily inside a standard-issue police 4 × 4 sports utility vehicle (e.g., Mitsubishi Shogun) and be deployable by no more than two personnel. 4. Low cost-It should cost roughly an order of magnitude less than existing ocean-grade solutions used by the commercial and military sectors, that is, on the order of tens of thousands of pounds.
The system is intended to be used at times when a dive team is on site but would not normally enter the water. Possible time windows include during mission briefing and preparation (approximately 1 h) and outside regular working hours (approximately 8 h overnight). The first use case for consideration is the search for objects that resemble human bodies. In this context, resemblance relates mostly to the F I G U R E 2 Illustrations of manual search patterns (Southwood, 2011). (a) A jackstay search pattern allows one or more divers to search large areas thoroughly and efficiently. The divers traverse a search line (jackstay) rigged on the bottom (left-right). When the diver(s) reach the end of the line, it is advanced by a certain distance (up-down) and the diver(s) search back along the line. (b) A circular search pattern can be efficient when the position of a target is approximately known. Diver(s) search in concentric circles around a central point where a rope is fixed. A constant radius is achieved by maintaining tension on the rope; the radius is extended by a certain distance each revolution by increasing the length of the line. (a) Linear (jackstay) search pattern and (b) circular search pattern. approximate dimensions of an object, since a body could be in any pose or it could be concealed. Moreover, it was identified as the simplest first step due to the large size of the target object in comparison to those in other use cases (i.e., weapons, packages, or IEDs).

Our prototype solution is the Police Robot for Inspection and
Mapping of underwater Evidence (PRIME). The system hardware is based around a custom-built USV with off-the-shelf electric propulsion, sonar sensing, navigation, and computing hardware. The software is based on the Robot Operating System (ROS) (Quigley et al., 2009) using a mixture of existing open-source packages and custom-built nodes. PRIME can autonomously execute search patterns, acquire and process data, and communicate its gathered intelligence on the underwater environment to a computer on shore in real-time. It cannot currently avoid obstacles, but has a manual remote-control override for safety. The information communicated to shore is in the form of a simplified "heat map" that does not require significant training or experience to interpret. A satellite map of the surveyed area is overlaid with the cumulative sonar coverage area using a simple two-color-scale indicating regions in blue that are benign and featureless versus areas in red where the features could indicate the presence of a body. The originality lies in the real-time, autonomous generation of this actionable intelligence to aid human dive teams in missing persons scenarios, by indicating where to prioritize efforts during a search. This paper presents the design, implementation, field deployment, and testing of the PRIME system prototype. The interested reader can find comprehensive descriptions of the design and implementation details, covering the physical platform, sensing hardware, and electronics in Section 2 and control, perception, and autonomy algorithms and software in Section 3. These technical sections can be skipped without impacting on the interpretation of results from the field experiments, which are presented in Section 4.
The outcomes are discussed in Section 5 and a final summary is provided in Section 6.

| Platform
A USV was selected as the robotic platform in preference over an uncrewed underwater vehicle. This is a well-established autonomous vessel configuration, with the design developments and challenges well documented (Z. Liu et al., 2016;Manley, 2008), and has been chosen for several reasons. A USV operating from the surface has easy access to Wi-Fi and global navigation satellite systems (GNSS), which greatly simplifies communication and navigation. It is visible to the dive team and users of the waterway at all times and vice versa. It is simpler to deploy and recover, it does not need complicated trimming and ballasting, and requires lower standards of waterproofing.
The PRIME USV is custom-built and has been designed and developed over multiple iterations of rapid prototyping. It has a catamaran hull with dimensions of approximately 1.2 m in length, 0.6 m in width, and 0.4 m in height. The total weight of the vehicle and its typical payloads is around 20 kg. Earlier iterations (PRIME-1, -2, and -3) explored variations on the hull design. These were constructed from pairs of foam pontoons, reinforced with coatings of epoxy and fiberfill. Aluminum sheets and extrusions were used to assemble the two pontoons together and to provide a frame for attaching further hardware. Two T200 DC brushless thrusters (Blue Robotics Inc. (USA), 2021b) are mounted at the rear of the USV for propulsion. The differential drive provides good maneuverability and a maximum speed in the still water of approximately 3 m/s. The latest design iteration, PRIME-4, is shown in Figure 3 and the previous iteration, PRIME-3, is shown in Figure 4a. PRIME-4 has a molded catamaran hull constructed from carbon fiber (C12 Composites Ltd. (UK), 2020). It travels at an operating speed of approximately 1.2 m/s. It also features hinged components to allow for hardware that is submerged during operation, such as the sensors and thrusters, to be raised for easy transport and storage.
Modular payloads can be mounted to the aluminum frame, with the electronics and batteries being housed in IP68-rated waterproof enclosures. This allows flexibility for experimenting with different operational concepts. Although commercial USVs with similar capabilities readily exist (Kebkal et al., 2014), a custom USV was built in the interest of research flexibility and cost saving.

| Underwater acoustic sensing
Side-scan sonar is one of the most commonly used underwater imaging technologies (Blondel, 2009). It was selected for use on PRIME due to its rapid area coverage rate and its cost advantage over more complicated multibeam systems. It is intended for the initial wide-area survey of the underwater environment to expose areas of interest for the human dive team and/or to inform autonomous F I G U R E 3 PRIME-4 USV with carbon-fiber hull, showing the sonar transducers partially deployed. GNSS, global navigation satellite systems; INS, inertial navigation system; PRIME, Police Robot for Inspection and Mapping of underwater Evidence; USV, uncrewed surface vessel. reinterrogation using other sonar types or other sensing modalities (e.g., optical, chemical, magnetic, or tactile) in the future.
Side-scan sonar operates by projecting an acoustic pulse or "ping" to the port and starboard sides of a platform at regular intervals as it moves along a nominally straight track at constant velocity. The acoustic beams are narrow in the direction of travel and wide in the vertical direction, as illustrated in Figure 5a. Thus, each ping measures the intensity of acoustic reflections within the beam cross-section as a function of acoustic travel time. These 1-D measurements are stacked to generate a 2-D raster image that is an orthographic projection of the 3-D underwater environment. The axes of the image correspond to the port and starboard ranges versus the position along the track, determined from the travel times and the known speed of sound in water c ≈ 1450 m/s. In the range axis, the image resolution is constant and is determined by the bandwidth of the acoustic signal. In the along-track axis, the resolution degrades with range and is determined by the horizontal beamwidth. The vertical beamwidth and declination of the transducers determines the observable swath of floor and, typically, this leads to a blind spot directly below the vehicle termed the nadir gap. In practice this gap can be filled using another sensor (e.g., a downward or forward-looking sonar or camera) or by conducting surveys with overlapping coverage (Hunter et al., 2018).
The image formation process is illustrated in Figure 5.
Side-scan sonar images have the appearance of a top-down view of the floor with side illumination from the track towards the port and starboard directions. The nadir gap manifests as a strip of low reflectivity in the center of the image, corresponding to the water volume between the vehicle and the first observable range to the floor. A particular characteristic of side-scan sonar images is that an object sitting proudly on the floor appears as a highlight due to the reflection from the object, followed by an acoustic shadow cast onto the floor behind. Importantly, the orthographic projection preserves the geometrical dimensions of the object. These image features are useful for object recognition. Furthermore, there is future potential to use spectral differences between the images from each band to aid pattern recognition.

F I G U R E 4 Selection of photographs from the various field experiments. (a) Bathampton canal, with earlier PRIME-3 prototype, (b) Bristol
Harbour, with latest PRIME-4 prototype, and (c) clothed mannequin test target. PRIME, Police Robot for Inspection and Mapping of underwater Evidence; Wi-Fi, wireless fidelity.
During operation, the 450 and 990 kHz transducers are deployed at depths of 20 and 15 cm below the water surface, respectively, and at a declination angle of 30°.

| Navigation sensing and communications
PRIME uses an SBG Ellipse inertial navigation unit (France) with an IMU and external GNSS antenna, both of which are mounted externally, as shown in Figure 3. This gives a positioning accuracy to within 2 m and provides accurate timing. The IMU produces pitch and roll measurements, but its main purpose on PRIME is to measure heading using its magnetic compass with accelerometer and gyroscope aiding. The heading accuracy is within 0.8°, but must be calibrated with the platform fully powered and running to account for any electromagnetic interference from platform electronics.
A Wi-Fi hub and antenna on the shore are used to facilitate communication between all of the computers in the system-both onboard the USV and on the shore for monitoring and control.

| Electronics and computing
The hardware architecture is illustrated in Figure 6. The rear electronics box houses the control hardware and the front box houses the sensing and autonomy hardware. F I G U R E 5 A side-scan sonar makes acoustic reflectivity measurements from fanshaped beams as the platform moves along a track. These are stacked to form a 2-D raster image. Changes in the composition of the floor appear as variations in the image intensity (e.g., stronger for hard materials like gravel and weaker for soft materials like mud). A low-intensity strip at close range corresponds to the water column between the sensor and floor, including objects in the water column (e.g., fish). Objects on the floor appear as highlights followed by acoustic shadows and objects cast longer shadows with range (e.g., the boulder). (a) Imaging geometry and (b) raster image.
The current prototype uses five lithium-polymer (LiPo) batteries for separately supplying the two thrusters (21 V fully charged for a specified operating range of 7-20 V), the two sonar data acquisition units (25 V for a range of 9-28 V), and the SBCs in the front and rear boxes (25 and 12 V, respectively, converted to 5 V).

| Autonomy
A basic level of autonomy has been implemented that allows the USV to conduct a predefined survey pattern, whilst simultaneously processing the sonar data and reporting the mission status and underwater intelligence to the shore in real-time.
The design is based on a layered hybrid deliberative/reactive architecture (Gat et al., 1997), with behaviors organized into the four broad categories of the observe-orient-decide-act model (Boyd, 1987;Proud et al., 2003). In the reactive layer of the architecture, the USV acquires data from its sensors and processes these into useful data products. It uses some of these products directly for reactive control and some for higher-level perception. In the deliberative layer, the data products are fused into a useful world model. This represents the robot's situational awareness of itself and the environment, and is used to inform the deliberative planning and execution of the mission goals. Components of the world model relating to the mission status and, importantly, the underwater intelligence are communicated to human operators on shore.
In the current implementation, the reactive layer contains behaviors relating to navigation, guidance, and thruster control for executing straight survey lines between waypoints and the acquisition, processing, and automated interpretation of sonar data. These are detailed in Sections 3.2-3.4. The deliberative layer contains a simple path planner with predefined survey waypoints and a basic world model comprised of a situation map with georeferenced overlays for the various sonar data products, including the heat map for the human dive team. These are detailed in Sections 3.6 and 3.7, respectively.

| Guidance, control, and navigation
To ensure the collection of good-quality side-scan sonar data, the platform must travel in straight lines at a consistent speed. This speed depends mainly on the ping repetition frequency of the sonar, which varies depending on several factors such as water depth, maximum range, and reverberation from the environment (Blondel, 2009 An illustrative example is shown in Figure 8 for a straight path followed by a turn, where the sonar image generated along the straight portion is better quality than the distorted and undersampled portion during the turn. The two thrusters mounted at the rear of the USV provide a differential drive. Activation of both thrusters equally in the same direction produces forward or reverse motion ( The guidance system achieves straight survey tracks using the LOS method of Furfaro (2012). Figure 9 illustrates the geometry for the LOS algorithm. The location of the USV is denoted by x p and an ideal straight path is defined between waypoints x a and x b . The perpendicular distance of the USV from the ideal path is given by (1) and the closest point on the ideal path to the USV is given by A lookahead point F I G U R E 7 PRIME-4 software architecture, represented as a simplified ROS node graph. Ellipses represent nodes, rectangles represent topics, and arrows represent the publishing and subscribing relations; the nodes and topics in white are custom-built whereas the others are standard or from existing packages. Note that in this early prototype there is closed-loop reactive feedback and control, but the deliberative control is open-loop. GPS, global positioning system; IMU, inertial measurement unit; PID, proportional, integral, derivative; PRIME, Police Robot for Inspection and Mapping of underwater Evidence; ROS, Robot Operating System; UTM, Universal Transverse Mercator.
is introduced on the path at a distance h ahead of ⊥ x , which advances along the ideal track as the USV progresses. A LOS vector is defined to provide a target heading for the USV. The heading error is established between the USV heading θ and the target heading θ′, where y x arctan 2( , ) is the four-quadrant inverse tangent function.
The speed error which is based on the Ziegler-Nichols method (Ellis, 2012).
Autotuning was carried out whilst executing a demand trajectory along a 5 m × 5 m square. The process was repeated on different bodies of water, including a canal stream, a still lake in both calm and windy conditions, and a working harbor with passing traffic.
The values obtained were then manually averaged to provide a common set of gain parameters across all environments tested.
The output range for both controllers is normalized in the range of −1 to 1.
As the USV moves along its path the location of the lookahead point x′ advances, and the USV is guided as if being pulled towards the sliding lookahead point. The distance d can be adjusted by the choice of lookahead distance h, with a shorter lookahead reducing d and hence the path following error. A lookahead distance of h = 2m was found to give good performance for path tracking in the environments tested.
In situations when the heading error is large, for example, immediately after a waypoint has been reached, it is advantageous to prioritize heading error reduction over forward velocity. This is achieved by disabling the thrust controller and halving the yaw F I G U R E 8 Effect of path linearity on side-scan sonar image quality. During the straight portion of the track (A), the sonar beams produce a uniform raster coverage of the floor. However, during a turn the beams are bunched and spread on the inside and outside of the turn, respectively, leading to image distortion (B and C) and undersampling (C).
F I G U R E 9 Geometry of the line-of-sight control algorithm used by the uncrewed surface vessel to traverse straight tracks between waypoints.
| 991 controller gains when the magnitude of the heading error β exceeds a user-defined value σ, that is, A value of σ = 30°was found to give good performance.
The PID controller outputs are fed to a thruster arbitration algorithm, which maps the thrust and yaw velocity demands to the port and starboard thruster ESC commands, which are functions of the orthogonal yaw and thrust velocity demands, represented as a and b, respectively, and normalized between −1 and 1. The same algorithm is used to arbitrate control commands from the manual override. This algorithm is based on a method for mapping orthogonal commands such as those from a joystick input to a differential drive, or skid-steer system, commonly used on tracked vehicles, such as tanks (Taylor, 2010

| Side-scan sonar image processing and analysis
The side-scan sonars produce images from each track in two frequency bands on the port and starboard sides of the vehicle, that is, 4 images per track. These are processed and analyzed to generate simplified heat maps, which are fused over multiple tracks and georeferenced into a world model for the situational awareness of the USV and human operators.
The reader is referred to Section 4.1 of the results to find representative examples of the sonar images, which will aid in understanding the sonar image processing and perception algorithms described in the following sections.  (Yang, 2006), as the initial stage of the image processing chain.
Consider a raw greyscale image where u and r are along-track and range directions, respectively, and pixel-intensity values have been normalized between 0 and 1. The image after contrast stretching and normalization is given by

| Object detection
For the automated detection of targets, an approach based on 2-D wavelet filtering and complexity mapping (Geilhufe & Midtgaard, 2014) is used. Wavelet filtering is applied to emphasize objects with scales that are consistent with the target body-like objects.
The wavelet-filtered images are then simplified using an image contrast metric to quantify the feature complexity at these scales.
Regions of high complexity are thereby associated with a high likelihood of the presence of target objects (Fakiris et al., 2013;Williams, 2015).

| Wavelet filtering
The images can be represented by their wavelet decompositions, The operator W ⋅ { } performs the wavelet transform using a chosen mother wavelet function and produces a multiresolution set of subimages. These contain wavelet coefficients ( where  The superscript ∈ n N [1, ] denotes the integer-scale level, and the maximum level N (corresponding to the largest scales and lowest spatial frequencies) is limited by the number of pixels in the image and the chosen wavelet. The subscript ∈ m A H V D { , , , } denotes the approximation (low-pass) coefficient and detail (high-pass) coefficients for the horizontal, vertical, and diagonal orientations, respectively (Mallat, 1989).
Wavelet filtering can be applied to retain only coefficients at scales consistent with the expected dimensions of the target object.
Thus, a wavelet-filtered image is obtained by applying a scaledependent window function, followed by the inverse wavelet transform, where ∼ W s ( ) is the window function.

| Image complexity ("Heat") map
Regions of interest, containing potential target objects, are identified automatically by computing a complexity map from the waveletfiltered image. The root-mean-square contrast metric has been used (Fortune et al., 2001;Peli, 1990) to quantify feature complexity, evaluated over a sliding rectangular window. The resulting complexity map is given by This procedure is applied independently to both low-and highfrequency images and then averaged.
A window with dimensions P = 48 and Q = 96 was chosen with an aspect ratio of 2, roughly corresponding to the dimensions of the target object. Sliding steps of P Δ = 4 and Q Δ = 8 were used as a trade-off between the smoothness of the resulting complexity map and computational load. These simplified data products are communicated to the end users instead of the more complicated sonar images. They are referred to as "heat maps" due to the choice of the color scale, which uses red to represent regions of high complexity, indicating the likelihood of a target object, and blue for low complexity regions that are likely to be benign.

| Floor detection
The location of the water-floor boundary must be determined to facilitate the removal of the water column and to estimate the depth for georeferencing the sonar images and heat maps. Wavelet filtering has been used to isolate features of appropriate scale that are parallel with the path by adjusting the parameters of the procedure described in Section 3.4.1 accordingly. This is followed by edge enhancement, using the Sobel edge-detector (Sobel, 1970) and peak detection to determine the boundary, which is smoothed using a moving average filter. The wavelet-filtered image is given by is a "high-pass" window that isolates only the vertical detail where ⋅ { } is an edge enhancement operator. The Sobel operator with a kernel size of seven points was found to provide satisfactory enhancement of the water-floor boundary in this application, and was implemented via the functions available in the Python Open Computer Vision library (Bradski, 2000). The range to the boundary is estimated for each along-track position x by finding the onset of the peak in the edge-enhanced image RYMANSAIB ET AL.
is the range at which the peak occurs, A is a fraction, and h u ( ) is a smoothing filter. A value of A = 0.6 and a 30-point moving-average filter were found to work well.
The approximate depth of the floor is then estimated along the track using the range to the nadir R min , and the known sonar geometry where ϕ is the declination angle, ϕ Δ is the vertical beamwidth, and D is the depth of the sonar transducer. A constant depth with range is assumed, that is, a flat bottom. The depth estimate is used in the image georeferencing process, described in Section 3.7.

| Survey planning and execution
Survey plans are currently predefined by an operator, but executed autonomously by the USV. The plan is specified manually as a list of waypoints in longitude and latitude. The waypoints are converted to local Universal Transverse Mercator coordinates before being passed on to the LOS controller described in Section 3.2. The LOS controller requires two waypoints, x a and x b , to define a track, as illustrated in Figure 9. Upon initializing the controller, the current position of the USV is taken as x a (previous waypoint), and the first waypoint in the list is taken as x b (current waypoint). The USV is guided towards x b and, once reached, x a is replaced by x b , and x b is replaced by the next waypoint in the list. A waypoint is considered to be reached when the is below a user-defined tolerance T, for which T = 2m has been found to be a suitable value. The process repeats until the waypoint list is exhausted and the survey is complete.
A typical survey pattern is comprised of several "lawnmower" patterns of multiple straight tracks similar to a paired-track survey (Hunter et al., 2018), with each pattern oriented at a different fixed angle. Multiple orientations are used to observe the floor from different look angles, thus increasing the likelihood of detection (Fawcett et al., 2010;Zerr et al., 1997). More orientations with smaller angular increments provide a more thorough search at the cost of a longer survey duration and this leads to a trade-off.

| Situation map
A world model is created and maintained as the USV collects and processes data throughout its survey. This is stored in the form of a layered situation map, which is updated and communicated in real-time. It conveys information obtained from the survey (i.e., georeferenced sonar images and heatmaps), the survey plan and execution status (i.e., waypoints, planned and executed tracks), and USV state information (i.e., position, speed, and heading), all overlaid on a satellite map. Currently, the situation map acts as a high-level interface to the operator. In the future, it will be the basis for feedback into a deliberative controller for task planning, sequencing, and execution.

| Georeferencing
The sonar images and corresponding heatmaps from each completed track are georeferenced and presented as layers with adjustable transparency on the map. The georeferencing is performed by splitting each image (and heatmap) into multiple along-track sections of length L . Each section is interpolated from local image coordinates u r ( , ) into the global coordinates x y ( , ) via a projective transform to build a geometric map (Burguera & Oliver, 2016). Thus, a georeferenced image from a track comprised of K sections is given by where the projective transform for the kth section is defined by mapping its corners (i.e., the minimum and maximum ranges at the beginning and end of the section) to the appropriate locations on the floor where the + or − are used for the port and starboard sides, respectively. This mapping is illustrated in Figure 8 and described in Figure 10 and the ground distance to the maximum observed range R max .
Here, images are divided into sections of length L = 1 m. Sections that are within the tolerance distance T of the waypoint are also removed to exclude severely distorted projections that occur during turning, as demonstrated in Figure 8b.

| Heatmap fusion
The georeferenced heatmaps produced from all of the tracks are collated at the end of the survey to fuse processed information which has been gathered from multiple orientations. This concept has been shown to improve the performance of ATR algorithms (Fawcett et al., 2010;Zerr et al., 1997). A simplified implementation is used here, by first applying a minimum threshold C min to the individual heatmaps so that only excess values of significance are accumulated A plastic mannequin approximately 1.8 m tall has been used as a test target to resemble an adult human body. The mannequin has been clothed, as shown in Figure 4c, to provide a more realistic acoustic scattering signature. When deployed, it is weighted with a small sand-filled plastic weight causing it to fill with water and sink to the floor. A loose rope is also attached and tethered to the shore for recovery afterwards. time-varying gain is applied to compensate for the acoustic spreading and (frequency-dependent) attenuation loss.

| Representative images from Bristol Harbour
There are some features in the images that do not correspond to the underwater environment but are artefacts of the system. A lowintensity strip is visible on the floor at close range on both sides in the 990-kHz image. This is caused by a null and sidelobe of the vertical sonar beam, which is narrower at the higher frequency. These beam patterns can be compensated using intensity correction techniques (Burguera & Oliver, 2014) but, in this system, the region is covered adequately by the low-frequency band. Despite efforts to minimize electrical interference, some cross-talk from out-of-band harmonics generated at 450 kHz is present. This manifests as a narrow noisy feature in the water column of the 990-kHz images that is symmetrical about the port and starboard sides.
The wavelet-filtered result of the 990-kHz image is shown in

| Autonomous survey in Bristol Harbour
A preplanned survey pattern was defined over a rectangular region of 50m × 30m, with its longest edge adjacent to and aligned with the shore. Survey waypoints were arranged within the region to form a F I G U R E 12 Wavelet-filtered images in (a)  The full survey plan of overlapping patterns is shown in Figure 13a. A single pattern from this plan is shown in Figure 13b alongside the actual path executed by the USV. Some overshoot and recovery can be seen at the ends of the tracks and there is also an offset of approximately 1 m caused by the LOS controller overcoming the water current. Figure 13c,d shows two other patterns from the survey with the emphasized tracks corresponding to the example images used in Figures 11 and 12.
The USV executed the plan and communicated the situation map to the shore in real-time. Figure 14 shows several views of the situation map throughout the survey.
The georeferenced 990-kHz side-scan sonar images from one of the tracks (Figure 13c) are shown in Figure 14a, where it is overlaid as a layer on the satellite view. The features from the harbor wall and foundations (shown in Figure 11 and described in Section 4.1) can be seen to align well with the satellite view.
Moreover, the target can be observed at the known deployment location. The corresponding heatmap layer (Figure 12b) is shown in Figure 14b with the target highlighted clearly in red. However, the harbor wall has also been identified as a false positive. The averaging. This is the key output, produced autonomously in real-time by the system, that provides actionable intelligence to the dive team.

| DISCUSSION
Results from field testing of PRIME have shown that the current prototype can autonomously navigate a region defined by GNSS waypoints, map the underwater space with side-scan sonar sensors, and detect and localize a human body-shaped object located on the floor. The system presents actionable intelligence to the user in a simple and meaningful manner as shown in Figure 14d. The fusion of results from six survey patterns as illustrated in Figure 13, with an average pattern length of 550 m each, and a total length of 3.3 km was produced in under an hour.
In contrast, a single pattern would take over four and a half hours if carried out manually by a professional diver swimming at a moderate speed of 0.5 m/s (Wojtków & Nikodem, 2017). The heatmap effectively shows the target location, as well as benign regions, allowing for the area to be more efficiently searched by prioritizing regions of interest. This outcome supports the feasibility, viability, and utility of using an autonomous robotic aid for police and emergency underwater search operations.
Additional prior information of a search region increases the likelihood of a successful outcome, and minimizing the time spent in the water inherently increases diver safety (Erskine & Armstrong, 2021). In cases where a site requires revisiting, the georeferenced heatmap produced in Figure 14d allows for the search region and features within it to be accurately located.
Further development is required to produce a more robust and reliable system. For instance, the waypoint list is generated manually for a given search area and time-frame, thus requiring the operator to understand the sonar coverage requirements and geography. Ideally, the search patterns would be generated automatically, taking into account land-water boundaries and water depths, as well as user-defined mission parameters, such as areas of interest and exclusion zones. Furthermore, the system currently lacks the capability to autonomously detect and avoid obstacles and thus requires monitoring by a user to implement a manual override when necessary. This problem is a welldocumented area of research (Mousazadeh et al., 2018;Polvara et al., 2018;Wu et al., 2017), and further tests are required to evaluate which approach is most suitable for PRIME.
The path following controller described in Section 3.2 has been found to be robust enough to give sufficiently smooth, straight paths that provide good quality, undistorted imagery. However, since deviations from a linear track cause image distortion, in the future it may be desirable to include correction methods that compensate for vessel motion (e.g., Blondel, 2009;Burguera & Oliver, 2016) to ensure robustness under heavy currents or winds. Open-source versions of several of these processing techniques have been made available (Buscombe, 2017).
A pragmatic approach has been taken to improve target detection quality and lessen the impact of false positives and negatives whereby multiple sonar images are analyzed, taken from different positions and orientations, and also from two frequency bands. However, performance has not been evaluated in heavily cluttered environments, which may result in increased false positives during object detection using the current methods described in Section 3.4. Nonetheless, the experiments shown in Section 4 prove the system concept, as an end-to-end solution in a representative environment. Future work will evaluate more diverse sites and scenarios (e.g., varying water depths, bottom types, and more cluttered environments). Autonomous identification of smaller objects, such as weapons or IEDs, and discrimination between multiple object types, including potential hazards (e.g., sharp objects, entanglement risks, etc.) will require a more sophisticated sensing and/or machine learning approach. However, any algorithms developed in the future can easily integrated by virtue of the modular nature of ROS. Similarly, additional sensors can be added to improve target detection and classification capabilities.
Some further refinement is necessary for an end product. The current power electronics setup uses multiple batteries running at different voltages powering components independently. While this has been acceptable during development, it would be impractical in an operational system. In future prototypes, all modules will share a single battery bank (e.g., 6S LiPos at 25 V), battery management system, and regulated power rails of 5 and 18 V that are accessible via a common charging port but electrically isolated to avoid interface.
The current communications architecture relies on a single Wi-Fi network set up from a router situated on the shore.
This single-network architecture is simple and has been convenient for development, but its weakness is that critical communication between the onboard computers is disrupted when the channel is blocked, for example, when the vehicle travels out of range or is occluded. Future improvement to the communication architecture will use separate bridged networks onboard the USV and on the shore, and compatibility with cellular networks. Furthermore, in operational missions, it will be crucial to use encrypted networks (and onboard memory) for cyber security and data protection. This is the groundwork for a future operational concept in human-machine cooperation within the police and emergency services. Reaching this vision will require further research and development to elevate the prototype to a higher level of autonomy and technology readiness. To this end, future work will explore the use of more advanced perception and control algorithms (e.g., deep learning, active learning, etc.) to improve the quality of situational awareness and autonomy. It is expected that this will enable more sophisticated decision-making that allows the system to carry out more challenging tasks, adapt to uncertain environments and unexpected situations, and be trusted to operate safely and effectively with minimal training and oversight.

ACKNOWLEDGMENTS
The authors are grateful to the Engineering and Physical Sciences Research Council (EPSRC) for funding this work. They also thank the

University of Bath Mechanical Engineering technicians, including
Mike Linham, for their assistance in building the various prototypes.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.