Mapping Submerged Archaeological Sites using Stereo-Vision Photogrammetry



Creating photo-mosaics and plans of submerged archaeological sites quickly, cost-effectively and, most importantly, to a high level of geometric accuracy remains a huge challenge in underwater archaeology. This paper describes a system that takes geo-referenced stereo imagery from a diver-propelled platform and combines it with mapping techniques widely used in the field of robotic science to create high-resolution 2D photo-mosaics and detailed 3D textured models of submerged archaeological features. The system was field tested on the submerged Bronze Age town of Pavlopetri off the coast of Laconia, Greece, in 2010. This paper outlines the equipment used, data collection in the field, image processing and visualization methodology.

Sonar technology is currently the default choice of archaeologists and oceanographers when attempting to map large areas of the sea-bed (Green, 2004: 74–84; Ballard, 2007; Bowens, 2009: 103–111). While the long ranges of acoustic signals allow large areas to be mapped quickly, the post-processing and, most importantly, the actual interpretation of sonar data can be difficult, complicated by the effects of acoustic reflectivity and backscattering which also create geometric inaccuracies (Mitchell and Somers, 1989; Sakellariou, 2007; Capus et al., 2008). In contrast, while optical sensors can provide high-resolution images that are easy to interpret, their range is limited to a few metres due to the absorption and scattering of light in water. As a result short-range optical sensors have to be physically transported close to the underwater features being imaged. This is usually done by attaching the sensors to Remotely Operated Vehicles (ROV) or similar systems which descend to the sea-floor from a boat or similar working platform (Ballard et al., 2002; Bingham et al., 2010; Royal, 2012). Such an approach can result in running costs which are prohibitive for archaeological budgets, whilst the large size of many ROVs and their support ships make them poorly suited for operations in shallower water.

At present, detailed underwater archaeological surveys of sites are typically conducted by scuba divers using baselines, fixed site grids, measuring tapes, drawing frames and photogrammetry (Bowens, 2009: 114–134). While such approaches can be very effective, they tend to be time consuming and extremely labour intensive. The recording of submerged sites though the creation of 2D photo-mosaics constructed from overlapping images is common practice (Ballard et al., 2002; Foley et al., 2009). However, the problems with geometrical accuracy over single images and larger areas are widely known (Telem and Filin, 2010; Treibitz et al., 2012); optical distortion through the water, camera tilt and variations in the topography of the area being photographed usually mean that the assumptions for planar mosaics are not generally valid in underwater surveys. As a result only small groups of photos can be effectively grouped together with mosaics of larger areas becoming less geometrically reliable as errors are compounded the further one builds the mosaic from the centre of the first image (Green, 2004: 169). Attempts to geometrically rectify images prior to assembling mosaics using post-processing software have had some success (Martin and Martin, 2002), but the primary acquisition of the images still requires the laborious setting up of accurate grids, positioning of reference targets and the use of rigid bipod frames or towers to ensure pictures are taken from a constant height and the film plane remains horizontal. Equally such techniques remain difficult to use in undulating terrain and on upstanding 3D archaeological monuments such as shipwrecks and harbours.

The system described here represents a compromise between the expensive deployment of AUV technology, and traditional diver-orientated image-by-image photography. It can record large areas (∼100 m × 100 m) quickly and accurately and works effectively to record significant 3D structures in both undulating and flat bottom conditions. The approach requires no grids or control points to be set up, although during field testing, as described below, a system of ropes was used. However, it should be stressed that the ropes served as a guide for the divers and snorkellers in the water and were not a requirement for the effective operation of the equipment and recording of visual imagery. The key to the effectiveness of this technique in producing geometrically accurate photo maps is the application of Simultaneous Localization and Mapping (SLAM) techniques widely used by the mobile robotics community (Bailey and Durrant-Whyte, 2006; Durrant-Whyte and Bailey, 2006). SLAM can be thought of as a form of incremental bundle adjustment, which allows for observations of orientation, velocity and rates to also be incorporated into the estimate of the camera positions and scene structure. Research into SLAM has been one of the most active areas in robotics over the past decade and is used in real time to construct accurate maps of unknown environments, while at the same time tracking the location of the mobile robot being used to map that environment. SLAM combines data recorded on the position and orientation of the robot itself simultaneously with the visual data of the environment being recorded by the robot. In this instance stereo imagery was collected using the diver rig on-site and then a SLAM algorithm was applied to the images in post-processing (Mahon et al., 2008). An automated post-processing procedure applied to the images resulted in the production of photogrammetric plans and textured 3D models in a fraction of the time and to a higher geometric accuracy than can be achieved using traditional techniques (Johnson-Roberson et al., 2010).

Test site: Pavlopetri, Laconia, Greece

The submerged Bronze Age town of Pavlopetri lies close to the shore in just 1–4 m of water in the west end of the Bay of Vatika in south-eastern Laconia, Greece. The remains cover an area of approximately 60,000 sqm and comprise a network of stone walls, building complexes, courtyards, streets, graves and rock-cut tombs. The stone walls, which would have formed the foundations for mud brick and plaster buildings, are made of uncut aeolianite sandstone and limestone blocks and were built without mortar (Fig. 1). They can survive up to three stones in height but the vast majority survive only one course high, or are completely flush with the sea-bed. The dating of the architectural features and surface finds such as ceramics suggest the site was inhabited from at least the Early Bronze Age, c.3000 BC, through to the end of the Late Bronze Age, c.1100 BC. At its height, the settlement was likely to have had a population of between 500 and 2,000 people.

Figure 1.

General view of the corner of Building I at Pavlopetri looking south (scale 1 m). Typically the site appears on the sea-bed as a network of low un-mortared limestone and aeolianite walls. (Jon Henderson)

In 1968 a team from the University of Cambridge surveyed the submerged remains using a fixed 20-m-grid system and hand tapes (Harding et al., 1969; Harding, 1970). No further work was carried out at the site until 2009 when the University of Nottingham, the Ephorate of Underwater Antiquities, and the Hellenic Centre for Marine Research (HCMR) started a five-year collaborative project to investigate and record the town using modern methods (Henderson et al., 2011). One of the primary objectives of the survey phase of the current project (2009–2011) has been to record the site in as much detail and as accurately as possible, in the process field testing a range of new technologies. This has had the dual function of achieving complete preservation by record of the fragile remains of the site through a range of formats, while at the same time promoting the development of new methods of underwater archaeological survey.

The central spine of the project at Pavlopetri has been shore-based Total Station survey to create an accurate point-based vector-line plan of the town (Henderson et al., 2011, 209–210). From 2009 to 2010 a robotic total station was used to target divers taking 3D points in the water using a detail pole equipped with a prism. The points are displayed on the computer screen as they are taken, which is particularly useful in shallow-water survey as it means that grids and tapes do not have to be set up under water and divers can accurately position themselves at any point on the site at any given time (Henderson and Burgess, 1996). Using bubble levels on the detail pole the maximum error using this system was found to be less than 50 mm at a full pole extension of 5 m. The most common staff height was 3.5 m ensuring errors of less than 35 mm for the majority of the points taken. The accuracy of the instrument was tested twice a day on known reference points on the site. The shallow nature of the site and its proximity to the shore allowed the survey to be carried out to comparable levels of accuracy as those achieved on terrestrial sites (Figs 2 and 3). As a result the Total Station survey provides a baseline template against which the results from all the other technologies used on the site can be tested. While the Total Station survey continued in 2010, the Australian Centre for Field Robotics (ACFR) was invited to join the project to develop a robust stereo-photogrammetric mapping method for submerged archaeological remains.

Figure 2.

Total Station survey of the Pavlopetri archaeological site. The lines in purple mark the area surveyed by the diver rig. (Pavlopetri Underwater Archaeology Project)

Figure 3.

Total Station survey equipment and operation including land station (inset) and swimmers over the site with prism detail pole. (Pavlopetri Underwater Archaeology Project)


Stereo-vision diver rig

A unit which could be used in shallow water to collect digital image and sensor data was constructed by the ACFR for use at Pavlopetri. Referred to as the ‘diver rig’ the unit consists of the cameras, lighting, sensors, instrumentation and power source needed to take high-resolution images of the sea-bed, arranged inside a rigid but highly portable carbon-fibre and balsa-wood frame (Fig. 4 and Table 1). The core of the system consists of two high-resolution digital cameras set up as a stereo pair pointing downwards. The stereo pair consists of one colour and one gray-scale AVT Prosilica GC1380 cameras. They have very sensitive 1.4M Pixel 2/3″ CCDs. The cameras have 8-mm lenses that provide a field of view of approximately 42 deg × 34 deg in water. At 2 m altitude, this results in a footprint of approximately 1.5 m × 1.2 m and a spatial resolution of ∼1 mm/pixel. The two cameras are triggered simultaneously by a microcontroller typically at 2 Hz, providing three to five views of the same scene point at typical speeds and altitudes. Two LED strobe units were used, 0.5 m fore and aft of the down-looking stereo cameras. The cameras and strobes were used in an auto-exposure mode that would adjust the exposure time to achieve an average intensity of 30% of the range value. This allowed operating under a wide range of lighting conditions while maintaining similar illumination levels. The secondary instruments consisted of a pressure sensor (depth), surface GPS receiver, and a solid-state IMU (for attitude). The design of the system was based on the same component array used and tested in AUV and ROV configurations by the ACFR on previous oceanographic research missions (Johnson-Roberson et al., 2010). The diver rig differed only in that it uses a subset of secondary instruments1 and was designed to be manipulated and propelled by divers in the water rather than being attached to an AUV or ROV.

Figure 4.

The stereo-vision diver rig built for Pavlopetri and list of components. (Oscar Pizarro)

Table 1. Diver rig components
Imaging ModulePower ModuleExternalRig
Stereo Baseline 75 mm between camerasBatteries Oceanserver 190 Wh Li-ion pack (3 hr typical)Depth Seabird pressure sensorDepth rating 150 m
Separation 0.5 m between camera and lightsAttitude Microstrain 3DM-GX1 solid state AHRSLighting Two 12000 Lumen LED arrays, 4ms on time 
LED driver Gardasoft PP-520FCommunications Digikey Portserver (4 serial ports)GPS receiver SPK-GPS-GS405 
Processing ADL945PC Core 2 Duo PC/104 Plus   
Storage 320 GB (8 hr typical)   

The camera system was designed for use on robots collecting millions to tens of millions of images during its lifetime. The housings are not meant to be opened during fieldwork (downloading imagery and charging is done through a cable that connects to a pressure tolerant-bulkhead/connector) which reduces chances of failure through leaks from improper sealing, and minimizes potential problems with calibration. While a DSLR can provide higher resolution at a reasonable cost, it is much harder to build a flexible data acquisition system around one and to log and time-stamp other sensor data consistently. It also typically involves opening housings for data downloads or charging, which increases the chances of a leak or catastrophic housing failure. In addition, a camera with a mechanical shutter only has a life of a few hundred-thousand cycles. At 2 Hz, this represents tens of hours of surveying before failure. Given the limited field of view and range that can be effectively used under water, a higher resolution per frame would not lead to a lower number of images, only to increased spatial resolution. In practice, given computational and memory constraints, the 3D point cloud and meshes generated from each stereo pair are simplified approximations using ∼1000–2000 vertices rather than ∼1M that would be theoretically possible .

Operation in the field

With the diver rig on-site for only ten available working days during the 2010 season it was not possible to map the site in its entirety. Instead a survey plan was developed giving priority to areas containing structures of most archaeological importance. An effective method of operation to ensure full image coverage in the areas selected for photo-mapping was developed by the archaeological and ACFR teams during this time. The use of diving equipment to operate the diving rig was found not to be necessary at Pavlopetri due to the shallow nature of the site. Most of the features mapped lay in 1.5–2.5 m of water which provided optimal ranges for imaging purposes. A side benefit of this was the water surface acted as a horizontal control ensuring the pictures were taken from a relatively constant height and that the film plane remained horizontal. During the survey the extreme ranges of use of the diver rig were found to be between 0.75 m and 4 m; deeper than 4 m resulted in unacceptable levels of backscatter, while up close focus was difficult and coverage became problematic as the resulting image footprint was small, meaning operations would be much slower and require many tightly spaced parallel tracklines to obtain full coverage.

Due to the fact the diver rig was used on the surface, it could take GPS readings throughout the survey ensuring that the location of the diver rig could always be determined and that the data recovered from it was geo-referenced. As a result there is no need for the setting up of accurate grids or reference points for the effective functioning of the diver rig. On saying this, a loose system of ropes were used by divers to delimit areas of the site being imaged so that the diver-rig operators could quickly position themselves on the site and could easily ascertain which areas had been imaged and which were still to be done. The positions of the ropes used during imaging were surveyed by the shore-based Total Station system as a further level of control on the data being recovered from the diver rig.

Surveying with the diver rig was performed by a team of four. A ‘mow the lawn’ survey pattern with overlapping parallel tracklines was used to provide complete coverage of the sea-bed area being imaged. High overlap between adjacent tracklines is advantageous when estimating the diver-rig motion from the imagery (and then generating the 3D meshes) but the greater the overlap, the greater the survey time and resulting processing time. During the survey we aimed for a 50% overlap as a compromise. Two snorkellers took turns operating the diver rig, while the remaining two people maintained navigation aids at either end of the swim lines to ensure the operators swam in straight lines and moved ∼0.75 m along after the completion of each line. Initially two snorkellers were used to shift two surface floats connected by a 10 m or more length of rope along the area being imaged (this area being delimited under water by two ropes 10 m apart running perpendicular to the surface float ropes). This method, however, proved to be insufficient to maintain straight lines when operating in currents, resulting in small gaps in the overlapping imagery. An improved system was then used, in which two divers held poles between which a taut rope was tied, providing a superior swim line on the surface for the diver-rig operator to swim along (Fig. 5). At the completion of each pass the divers moved the swim line by ∼0.75 m, following the ropes laid along the sea-bed, thus ensuring sufficient overlap in the image data of the areas being mapped (Fig. 6).

Figure 5.

An operator with snorkel propels the diver rig using a guide line for navigation. (Jon Henderson)

Figure 6.

Taut swim-lines in the water were maintained and moved by divers at the completion of each pass. (Jon Henderson)

The diver rig operated at 2 Hz and, at the speeds the swimmer was moving, resulted typically in 3–5 stereo views of a scene point ‘along track’. The 50% overlap between parallel ‘tracklines’, provides another 3–5 stereo views (half of trackline 5 will overlap with half of trackline 4, the other half of trackline 5 will overlap with trackline 6). Each scene point is observed somewhere in 6–10 stereo pairs. The final camera pose (position and orientation) estimates are made using all relative pose observations (between that camera's view and all neighbouring views that observed enough common scene points to calculate a relative pose between the cameras).

Surveying was found to be most effective performed shortly after sunrise and before sunset due to the lighting conditions. When the sun was higher in the sky, the acquired images were impaired by caustic lighting effects produced by ripples and waves on the ocean surface. In addition to making the images more difficult to interpret, this also reduced the performance of the feature extraction and matching algorithms used by the mapping software. As a result, photo surveys, each lasting about an hour, were carried out just after dawn and just before dusk each day.

Post processing for 3D photo meshes

Once the images and related time-stamped sensor data have been collected in the field, the logged data can be downloaded from the diver rig. The images collected need to be post-processed before they can begin to be used to construct accurate 3D photo-mosaics. First, the images undergo correction for vignetting, which is the fall-off in brightness towards the edges of an image (Kim and Pollefeys, 2008). Following this, the image is white balanced following a simple grey-world model (Lam, 2005) to partially compensate for the colour-dependent absorption of light in water (Fig. 7).

Figure 7.

Image processing. The original image a) displays strong vignetting (dark corners), and a blue-green hue caused by the rapid attenuation of red light. An images first undergoes vignetting correction b), then colour balancing c). (Authors)

The overlapping photo imagery is then assembled together using the SLAM algorithm (Mahon et al., 2008) which accurately calculates the poses of the camera at which the images were taken. To generate the pose information the SLAM algorithm automatically extracts common 2D feature points from each stereo image pair and uses this information—combined with the GPS measurements, attitude (heading, pitch and roll) and depth measurements recorded by the sensors on the diver rig at the moment each photo pair was taken—to estimate the positions and orientations of the cameras in relation to the terrain itself. SLAM recognizes visual observations of common features across multiple images, such as overlapping images in the direction of travel, or across parallel tracklines, or when ‘closing a loop’ and coming back to an area already imaged, and forces a solution to camera poses that maps all views of the same feature to the same 3D scene point. This allows SLAM to estimate camera poses (and the triangulated terrain they imply) in a manner consistent with the features observed in the cameras, such as minimal doubled-up features, ghosting or mis-registrations, as well as with the sensor readings from the GPS, attitude and depth instruments. For this project, a regular GPS collecting observations at approximately ∼1 Hz was used which, although noisy, served to place the visual reconstruction in its approximate geo-referenced location. The final registration was then performed by manually aligning the data to the Total Station survey. In other situations where such baseline survey data is lacking a >10 mm accuracy differential GPS system could be used.

The position of the cameras and the positions of the points on the individual stereo photo pairs are triangulated together to create a common 3D reference frame from which Delaunay triangulated meshes are created (Johnson-Roberson et al., 2010). Due to the fact these individual meshes are created within a common reference frame, they can then be fused together to create a single large 3D mesh (Curless and Levoy, 1996; Johnson-Roberson et al., 2010: 33–37). As the mesh is derived from the overlapping imagery it is then possible to create a photo-textured 3D model by projecting the images on to the mesh. Blending the details in the images together over multiple spatial frequency bands (Burt and Adelson, 1983) helps produce visually consistent textures (Fig. 8) compensating for inconsistencies in illumination and the small registration errors that arise when projecting images on to an approximate 3D structure.

Figure 8.

The effect of blending on a section of the reconstruction. a) Red circles indicate obvious inconsistencies and seams; b) Green circles point out improved textures through multi-band blending. (Authors)

The fact that this approach recovers a 3D mesh from each stereo pair based on triangulating common features across the pair, means that the visual texture (image) is already properly registered to the structure and can be mapped back on to that 3D mesh with little distortion. The distortion present in simplifying the 3D structure to a facet in a triangulate mesh is typically much less than the size of the facets which are ∼10–20 mm in size. This compares very favourably to the typical approach to producing a mosaic that assumes an essentially planar scene. Such approaches usually result in large distortions across the mosaic due to the fact that motion parallax from 3D structures cannot be modelled adequately, hence the common warping of mosaics after stitching a few images.

The time taken to process the images was found to be directly comparable to the amount of survey time, typically taking two to three times as long as the time spent in the field. Given that at Pavlopetri two hour-long surveys were carried out per day, processing time usually took between four and six hours to complete, meaning it was typically possible and have a preliminary 3D mosaic the same or following day as the survey progressed.

Manipulating the amount of data produced is more of a challenge. Since many thousands of images are acquired and processed, the quantity of data for the whole or parts of the site can often be larger than the available memory on a computer used for visualization. To overcome this, a visualization engine was used to view the data. The engine employs a discrete level of detail (LOD) render in which several separate simplifications of the geometry and texture data are generated (Clark, 1976), allowing for a quick transition from a low-resolution overview of the entire site to high-resolution detail of particular parts.


During the 2010 field season, 14–25 June, a total of 47 snorkel swims with the diver rig were carried out over the submerged remains at Pavlopetri. A typical swim surveyed a grid area of approximately 15 × 10 m, gathered around 2,800 pairs of stereo images and took approximately one hour to complete. In total over 135,000 pairs of stereo images were taken, covering more than 7000 sqm of the site.

The SLAM estimated camera poses for a composite of three neighbouring survey boxes (4, 5 and 6) is shown in Figure 9. The size of the combined survey region is approximately 15 × 30 m (450 sqm), in which 6,315 stereo image pairs were acquired. These images were collected in just three hours of field time and represent 35 GB of data. Processing time for the combined dives was approximately six hours. After the first pass of SLAM processing (Fig. 9a), 197 visual ‘loop-closure’ observations were used. ‘Loop closure’ refers to the re-observation of the same point at a later time (the same scene point on different photos). Using the instrumentation alone estimates of the trajectory of the diver rig can drift, but once a ‘loop closure’ is recognized a correction is propagated ‘back’ to ensure the path is consistent with the observations. Since the search and use of loop-closures is computationally expensive, the SLAM tool is configured to only use a subset of informative loop-closures (that constrain the path of the rig) while incrementally estimating the path. An additional map-optimization step searches more exhaustively for additional loop-closures once the rough path has been determined. After the map-optimization step, (Fig. 9b), a total of 126,175 loop-closure observations were found and used by SLAM to improve the 3D mesh. While the map-optimization procedure does not make large changes to the estimated poses, small refinements improve the consistency of the generated 3D models.

Figure 9.

SLAM estimated camera poses and stereo-vision relative pose observations for the composite of survey boxes 4, 5 and 6. a) Before map optimization, 297 loop closure observations were established between 6,315 poses; b) after map optimization, a total of 126,175 loop closure observations were applied to the filter. (Authors)

Overhead views of the 3D model generated for boxes 4, 5 and 6 are shown in Figure 10. The stone walls of the site can be clearly seen in the depth map view (Fig. 10a) while the texture mapped mesh (Fig. 10b) provides a more traditional photo-mosaic view. The visible structures agree well with the Total Station vector line survey of the area, shown in Figure 10c, with a mean correlation between the two techniques of 50 mm. The biggest error between the two techniques was found to be 200 mm. The typical accuracy of the stereo geometry, when considering the ability to localize the corners of a checkerboard after calibration (Johnson-Roberson et al., 2010), is +/- 20 mm in depth and +/- 10 mm in the horizontal plane for the range of distances used on this survey (∼1–3 m).

Figure 10.

Map of the survey area of boxes 4, 5 and 6 produced by the diver rig. Remains of walls are clearly visible in both a) the depth coloured view and b) the texture mapped view; c) the corresponding area produced by the Total Station system showing good agreement with the layout of the major features of the site; d) shows a detailed texture mapped view of a cist grave. (Authors)

To test the operation of the system with no spatial controls (grid markers or navigation aids) the main submerged Chamber-tomb at the site was imaged by simply turning on the diver rig and repeatedly swimming over it. The large depth variations due to the rock-cut nature of the tomb provided a further challenge; the sides of the tomb were imaged by diving down and turning the diver rig on its side to fully image the sides of the chamber (Fig. 11). Although some holes in the mesh are present, excellent results were obtained from a short free-swimming mission of only 20 minutes duration (Fig. 12).

Figure 11.

Free swimming with the diver rig over Chamber-tomb 1 at Pavlopetri. (Jon Henderson)

Figure 12.

a) Depth and b) texture meshes of Chamber-tomb 1. The dimensions of the chamber are approximately 4 m by 5 m and the depth is 3 m at its deepest point. (Authors)

The resolution of the photo imagery (approximately 1 mm/pixel at 2 m from the bottom) is high enough to allow the recognition of individual ceramic sherds and finds on the sea-bed. As a result new discoveries have been made from studying the mosaics such as the identification of what appeared to be the rim of a large pithos storage vessel protruding from the sand just outside Building 10 (Fig. 13). Diving confirmed the location of the vessel which appeared to be in situ suggesting that at least a metre of deposit survived around the vessel in this area (Fig. 14).

Figure 13.

a) Depth mesh and b) texture map view of Building 10 including detail (inset) of the top of a large pithos vessel in situ. (Authors)

Figure 14.

The existence of the in situ pithos outside Building 10, first identified from the texture mesh, was later confirmed by diving. (Jon Henderson)

The texture map and depth meshes can be easily exported as geo-referenced TIFFs into computer illustration and GIS packages such as Adobe Illustrator and ArchGIS to create accurate stone-by-stone archaeological plans of the remains using the strength of digital accuracy and coverage (Fig. 15). In this way traditional 2D architectural and archaeological survey plans can be created. GIS packages have the advantage of retaining the geo-referencing information of the meshes, facilitating the construction of overall plans of the site from individual geo-referenced photo-mosaic sections, which can be placed within the survey using the Total Station vector-line data as a control. Achieving positioning detail in the order of centimetres the technique can be used not just to complement the hand-drawn record but to replace it (Holt, 2003). Areas can be quickly planned using the diver rig allowing the project to continue without long delays while hand-drawn and tape-measured plans are carried out—a particular advantage during underwater excavation.

Figure 15.

Producing traditional outputs. Tracing stones and wall features from a) texture and depth meshes in Adobe Illustrator and b) a final architectural plan view of Building 10. (Authors)

As well as 2D outputs, the textured polygonal meshes can be interactively presented in 3D using a realtime visualization engine. A 3D interactive rendering system, which we call benthicQT, has been developed around OpenSceneGraph ( This lightweight application allows for interactive flythrough, movie recording, and 3D-distance measuring all on dynamic level-of-detail models (Fig. 16). Care was taken to allow this software to run on basic hardware enabling a wide range of end-users to view the models on low-powered laptops. By only displaying structure and textures that are matched to the screen resolution, the user can visualize the reconstruction in a responsive environment without noticeable loss of quality. This allows a user to get a global view of the whole scene spanning tens or hundreds of metres, as well as smoothly zoom into detail of a particular sherd in an area only a few centimetres across. The fully interactive, photo-realistic models produced are rich in detail and present a close simulation of real-life conditions. As such they are an ideal way to share the site with other experts for interpretation, as well as with the general public for outreach purposes, as they need no specialist knowledge to be instantly contextualized and understood.

Figure 16.

3D view of Building 9 constructed from edge set slabs and measuring 8 × 5 m. All of the meshes produced can be fully manipulated in a 3D environment using a realtime visualization engine. Submerged structures can be examined in a) depth and b) photorealistic texture views. (Authors)

Future work

All components of the visual mapping system presented in this paper, particularly the post-processing pipeline, are presently being worked on and refined. The diver-rig structure itself is being redesigned to reduce weight and drag. While the work in 2010 demonstrated that the diver rig is an ideal tool for recording archaeological features in shallow water, it remains to be tested in deeper water. For operation away from the surface, geo-referencing cannot depend, as it did at Pavlopetri, on a GPS receiver fixed to the rig. Acoustic beacons or registration of 3D structure from stereo to acoustic bathymetry are options that are currently regularly used in marine robotics and could be used on archaeological sites with divers using the diver rig. Problems also remain in managing and presenting the large quantities of image data produced by the diver rig. At present we can produce high-definition reconstructions of sections of the site covering areas from 1 to 10,000 sqm at any one time, but further work is needed to combine all dives into a single reconstruction consisting of more than 135,000 stereo image pairs which retains enough resolution to be manageable and archaeologically useful. This will require further refinements to the way the visual data is presented with resolution depending on how much of the site is being viewed at one time.

Since 2011, the visual mapping system has been deployed at Pavlopetri in a small AUV to cover a greater area of sea-bed and generate a photo map and 3D model of the entire site. It could be argued that the relatively flat sea-bed conditions at Pavlopetri and the abundance of man-made features such as edges, corners and straight walls at the site facilitated the post-processing and creation of accurate photo-mosaics. The operational principles of the system are sound but we are aware of the need to test it on a range of underwater archaeological sites in differing terrain to determine its wider application and utility. As a result we plan to further refine the mapping module as part of an AUV configuration on a number of archaeological sites including shipwrecks, substantial submerged landscapes and deep-water features in the next few years.


The survey work using the diver rig at Pavlopetri in 2010 was the first time a stereovision SLAM system operated by humans had been used to systematically record an archaeological site. The diver rig is an adaptation of existing robotic platform optical technology to execute high precision, human-controlled underwater archaeological survey. With data collection being similar to operating a regular underwater camera, it is a logistically simpler (and certainly more economical) alternative to deploying an AUV or ROV system and demonstrates that such technology need not be limited to large-budget deep-water operations.

The ability to capture the data to produce detailed plans and accurate photo-mosaics of submerged sites quickly and with minimum fuss in the field has obvious advantages for underwater archaeological projects which regularly operate under tight time and financial constraints. Over 10 days, with an operational time in the water of just 2–3 hours per day, it was possible to accurately photo map 7200 sqm of the submerged town at Pavlopetri in stone-by-stone detail. Given that the diver rig can be deployed from the shore and can cover large areas, it is also relatively inexpensive to use when compared with the time and costs required for both detailed underwater survey using divers and ship-based multibeam technology. More importantly, the results produced can be held to the same standards for documentation and precision as achieved on terrestrial sites. The human controlled aspect—where archaeological divers are directly involved in the data capture—ensures that areas of archaeological importance are covered in sufficient detail. It provides the opportunity for ‘on the fly’ adaptive behaviour such as following particular features or collecting more data over particularly important areas which would be challenging to execute reliably with an autonomous robotic system. However, unlike such systems, the trade-off is that the diver propelled requirement results in substantially lower coverage than could be achieved in the same time using an AUV.

The production of accurate photo-realistic models of submerged features which can be easily interpreted by both archaeologists and the public alike is a major step forward in the dissemination and presentation of underwater archaeological data. It has the potential to allow the wider non-diving public to experience and investigate submerged sites, helping them to more readily appreciate the richness and value of the underwater resource. The technique described here could play a major role in the future recording, management and protection of submerged sites. Repeated photographic stereo-mapping would be an ideal way to monitor underwater sites, from submerged settlements to shipwrecks, as it offers the ability to recognize small-scale changes over time from survey to survey. Using traditional methods such an approach would be extremely slow and inaccurate, while using AUVs might be prohibitively expensive. The diver rig offers a useful and practical solution to underwater survey enabling rapid, accurate and easily repeatable site recording.

Every year technical advances translate into higher accuracy and richer data at less cost for the user. While the strengths of digital and robotic technology continue to improve, its limiting aspects in terms of costs and utility for underwater archaeology are lessening. As a result, carrying out digital survey of submerged remains can quickly move from a specialist technical endeavour to common practice, a change that all underwater archaeologists should embrace if the discipline is to continue to develop.


  1. 1

    The biggest difference from an AUV configuration, apart from lack of propulsion, was the absence of a Doppler Velocity Log (DVL) which provides very accurate motion estimates in real-time for automated vehicles so they can follow a desired trajectory. This data can be used to complement image-based motion calculations. A swath bathymetry unit, such as an Imagenex DeltaT 260 kHz, can also be used to complement the data. However, it was felt that this extra cost was not necessary in terms of achieving further accuracy or outputs. In this application the detailed bathymetric maps produced depend on the visual maps because in the process of creating the 3D mosaic we estimate the camera poses (using SLAM). If swath bathymetry was also carried out it would be these camera poses that we would then ‘hang’ the multibeam range data/swaths to create a cloud of 3D acoustic soundings. These could then be turned into a Digital Elevation Model (DEM) by binning.


This work was supported by the Australian Research Council Centre of Excellence programme, the Institute of Aegean Prehistory, The University of Nottingham, The British School at Athens, The Hellenic Ministry of Culture and The Municipality of Voiai. Special thanks to Dr Angeliki Simosi (Director of the Ephorate of Underwater Antiquities), Elias Spondylis (the Greek Director of the Pavlopetri Project), Aggelos Mallios, Dimitris Sakellariou and the Hellenic Centre for Maritime Research for their help in facilitating operations at Pavlopetri. Without the support of Professor Cathy Morgan, Tania Gerousi, Chrysanthi Gallou and the diver-rig field team (Gemma Hudson, Peter Campbell, Kirsten Flemming, Ariell Friedman, Robin Harvey, Derek Irwin and Stefan Williams) the work could not have taken place.