A modified Mask region‐based convolutional neural network approach for the automated detection of archaeological sites on high‐resolution light detection and ranging‐derived digital elevation models in the North German Lowland

Due to complicated backgrounds and unclear target orientation, automated object detection is difficult in the field of archaeology. Most of the current convolutional neural network (CNN) object‐oriented detection techniques are based on a faster region‐based CNN (R‐CNN) and other one‐stage detectors that often lack adequate processing speeds and detection accuracies. Recently, the two‐stage detector Mask R‐CNN technique achieved impressive results in object detection and instance segmentation problems and was successfully applied in the analysis of archaeological airborne laser scanning (ALS) data. In this study, we outline a modified Mask R‐CNN technique that reliably and efficiently detects relict charcoal hearth (RCH) sites on light detection and ranging (LiDAR) data‐based digital elevation models (DEMs). Using image augmentation and image preprocessing steps combined with the deep learning‐based adaptive gradient method with a dynamic bound on the learning rate (AdaBound) optimization technique, we could improve the model's accuracy and significantly reduce its training time. We use DEMs based on high‐resolution LiDAR data and the visualization for archaeological topography (VAT) technique that give images with a very strong contrast of the terrain and the outline of the sites of interest in the North German Lowland. Therefore, the model can identify RCH sites with an average recall of 83% and an average precision of 87%. Techniques such as the modified Mask R‐CNN method outlined here will help to greatly improve our knowledge about archaeological site densities in the realm of historical charcoal production and past human‐landscape interactions. This method provides an accurate, time‐efficient and bias‐free large‐scale site mapping option not only for the North German Lowland but potentially for other landscapes as well.


| INTRODUCTION
The increasing availability of airborne high-resolution light detection and ranging (LiDAR) data has led to an ever-growing interest in applying remote sensing to the archaeological domain, in which the use of machine learning techniques is now also increasing (e.g., Cowley et al., 2020;Davis, 2018;Opitz & Herrmann, 2018).
Considering the trends in the developments of neural networks in recent years, deep learning techniques have initiated a significant change in the field of computer vision. Catalyzed by the use of convolutional neural networks (CNNs), strong advances in many classic visual inspection tasks have been made recently, for example, in object detection, object localization, semantic segmentation and object instance segmentation tasks (He et al., 2016;Krizhevsky et al., 2012;Simonyan & Zisserman, 2014;Zagoruyko et al., 2016).
Focussing on neural networks for object detection, progress made in recent years is almost completely linked to the basic CNN-based model and its extensions, namely, the region-based CNN (R-CNN), the fully convoluted network, the Fast R-CNN and the more efficient variant Faster R-CNN (Girshick, 2015;Girshick et al., 2014;Long et al., 2015;Ren et al., 2015). Mask R-CNN expands the Faster R-CNN architecture by adding an algorithmic branch for predicting an object segmentation mask parallel with the existing region proposal stage (He et al., 2017). A detailed description of the Mask R-CNN technique is given elsewhere (e.g., see, Kazimi et al., 2019). Because of its relatively easy trainability and efficiency, Mask R-CNN has seen a surge in popularity for use in object detection (Ahmed et al., 2020;Johnson, 2018;Sorokin, 2018;Yu et al., 2019) and in the analysis of archaeological airborne laser scanning (ALS) data (Gong & Zhang, 2020;Pham & Lefèvre, 2018; Verschoof-van der Vaart & Lambers, 2019).
In recent years, small anthropogenic landforms, that is, so-called relict charcoal hearths (RCHs, sometimes also called charcoalburning platforms or kilns), which are mainly found in forests and result from historical charcoal production, have attracted the attention of archaeologists and soil scientists in the North German Lowland (Raab et al., 2015). RCHs are part of landscapes, the so-called sociocultural fingerprint (Tarolli et al., 2019), and an important source of anthracological information (Gocel-Chaltè et al., 2020;Smidt et al., 2017), providing insight into historical land use practices (e.g., Tolksdorf et al., 2020;Deforce et al., 2020).
Recent studies in soil science focus on RCH site-specific changes of soil chemical properties, such as increases in soil organic matter contents and element stocks (e.g., Donovan et al., 2021), changes in soil physical properties (e.g.,  and effects on vegetational and faunal growth (e.g., Buras et al., 2020;Gießelmann et al., 2019).
In flat terrain, RCHs are generally circular in shape, with a wide range of diameters (up to 30 m and averaging 12 m), are elevated several decimetres above the earth's surface and are often surrounded by a shallow circular ditch or multiple small pits . These morphological properties are favourable for detecting RCHs on LiDAR-based digital elevation model (DEM) visualizations because the circular elevation (positive feature) and the ditch (negative feature) form a strong visual contrast. Various studies manually mapped and digitized RCH sites, mostly by means of shaded relief visualizations (e.g., Carter, 2019;Deforce et al., 2013;Raab et al., 2019;Risbøl et al., 2013;Schmidt et al., 2016) or other visualization techniques (Hesse, 2010). Since high-resolution LiDAR data have become readily available for an increasing number of countries and RCH sites are found in increasing larger forested areas, fully or semiautomated methods are required to decrease the workload of such labour-intensive manual mappings. Some previous (semi)automated mapping approaches for RCHs and other archaeological objects involve template matching (Schneider et al., 2014), geographic object-based image analysis (GEOBIA) (Witharana et al., 2018) and, more recently, deep learning techniques (e.g., Kazimi et al., 2020;Lambers et al., 2019;Trier et al., 2018;Trier et al., 2021;Verschoof-van der Vaart et al., 2020).
Mapping RCH sites shares similarities with mapping burial mounds, which are some of the most frequently studied archaeological sites globally (Davis, 2020). However, mounds may vary in their shape more so than RCHs, having rectangular, triangular and trapezoidal elevation profiles (Davis et al., 2019), while RCH sites are predominantly circular elevations. Nonetheless, impressive mound mapping results using machine learning techniques have been achieved (e.g., Guyot et al., 2018;Caspari & Crespo, 2019).
Recently, Mask R-CNN has been used by Kazimi et al. (2019) for the DEM-based identification of archaeological objects such as bomb craters, charcoal hearths, and barrows, constituting a multiobject detection approach. Our study uses a modified Mask R-CNN approach, which we developed independently from the aforementioned study. We propose several modifications and extensions to the standard Mask R-CNN technique to (1) make it adapt easier to new data and reduce overfitting, (2) minimize the training time of the model and (3) improve the model's accuracy by adding image preprocessing and augmentation steps. Therefore, we outline an improved method to detect charcoal hearths that can be easily applied to other objects in LiDAR DEMs.

| STUDY AREA
The study area is located in Lower Lusatia in the North German Lowland (Figure 1). It is covered by forests consisting mainly of pine and oak. The southwestern part of the area has been previously mapped based on LiDAR data, which revealed a relatively high density of RCHs of up to 440 RCH sites per square kilometre, but there are also areas featuring much lower densities (Raab et al., 2019). The relief throughout the area is disturbed by former military activities (trenches, bunkers, etc.). RCH sites in the area have been previously mapped and described based on older DEMs with comparably lower quality (1-and 2-m grid sizes) and ground surveys by ,  and Raab et al. (2019). In this study, we mapped RCHs in 20 subareas of 0.17 km 2 each, totalling 3.4 km 2 . Ten training and 10 validation areas were selected to include a variety of RCH site densities and anthropogenic relief disturbances. The sites in the north are, on average, larger, have a lower spatial density and yield less anthropogenic disturbance on the adjacent relief than the sites to the south. The charcoal produced in the area provided resources for the nearby ironwork in Peitz, which operated from the mid-16th to the mid-19th century. Consequently, site dating using dendrochronology revealed that the RCH ages were between 1654 and 1852 (Raab et al., 2015;Raab et al., 2019). The bounding boxes were drawn to include the platforms and the ditch DEM signature.

| Image preprocessing and augmentation
We applied several image preprocessing steps to increase the success of object detection. First, we vectorized the maps and bounding boxes since the input data and targets must be tensors of floating point precision values. Then, we normalized the tonal range of the maps by converting the pixels from integer grayscale values ranging from 0 to 255 to floating point values and dividing by 255, resulting in final floating point values ranging from 0 to 1. Without the normalization step, the model can trigger large gradient updates that can prevent the network from converging (Sola & Sevilla, 1997). Furthermore, we applied nonlinear diffusion filtering (Perona & Malik, 1990) on the images to further enhance the RCH features. For this step, we set the processing time to 10 s with a lambda value of 0.5. To increase the number of sites and decrease model overfitting, we first augmented the 10 training area maps by using image transformations.
The maps were duplicated randomly and modified by image rotation, translation and horizontal and vertical flipping. Therefore, the number of training maps was increased to 200.

| Model validation
The CNN model's efficiency for an object recognition task is given as the mean average precision (mAP). The precision is the ratio of the number of true positives to the total number of positive detections, while the recall gives the ratio of the number of true positives to the total number of ground-truth objects (Henderson & Ferrari, 2016). that is, the model's runtime can be substantially reduced, as seen by the position of the lowest validation loss in Figure 5. For AdaBound, F I G U R E 2 Example of modified VAT visualization (blend of sky-view factor, positive openness and hillshade visualizations) for Validation Area 3, showing circular RCH sites of various sizes, disturbances and visibilities. The smaller black dots are from pits resulting from historical military activity, and the larger black dots show pits that are most likely associated with hearth operation. RCH = relict charcoal hearth, VAT = visualization for archaeological topography F I G U R E 3 Overview of the image processing steps in the Mask region-based convolutional neural network (R-CNN) algorithm. ROI = region of interest 28 epochs are required, while for SGD, 36 epochs are required. We propose to stop early after 15 epochs, which constitutes a runtime reduction of approximately 50%. This is not only to reduce the processing time that would otherwise not improve the model loss and accuracy but also to avoid model overfitting. To assess the effect of the image preprocessing steps on the model's performance, we analyzed the training and validation subsets again but without applying any preprocessing steps. The model's mAPs for the training and validation datasets are subsequently reduced to 79% and 74%, respectively. Therefore, the preprocessing steps increased the mAPs of the model by approximately 4% on average.

| Model application
We applied the trained Mask R-CNN model to 10 application areas ( Figure 6) containing 305 manually mapped RCH sites in total. This model detects RCH sites with an average recall of 83% and an average precision of 87% (Table 1)  sites, it is unclear why they were omitted by the algorithm, as their resolution seems to be on par with similar sites in the vicinity (e.g., in Area 9).
The Mask R-CNN method excels even in complex reliefs; for example, it reliably detects sites in Area 6 that feature an abundance of anthropogenic relief disturbances such as bomb craters and former gun emplacements ( Figure 6). Furthermore, it detects sites in close spatial proximity (within 25 m of each other) or overlapping sites, as seen in Area 5. These spatial microclusters of sites are typical in the larger vicinity of the study area (Raab et al., 2019). Even sites that have been disturbed by forest management activities, sites with imperfect circular ditches and only partially visible sites on the map edges are detected by the algorithm.

| DISCUSSION
We have shown that the Mask R-CNN technique can be specifically

CONFLICT OF INTERESTS
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.