Automatic segmentation of airport pavement damage by AM‐Mask R‐CNN algorithm

The airport is an important infrastructure for air transport and urban traffic. The airport pavement damage seriously affects the safety of aircraft take‐off and landing. Therefore, the regular detection of airport pavement damage is critical for aircraft take‐off and landing safety. However, for small target areas, the existing pavement damage detection methods cannot effectively achieve detection. In addition, the airport pavement detection is carried out under low light conditions at night, and the existing detection methods may not be able to accurately detect the damage boundary. To address the above problems, an automatic detection algorithm of Mask R‐CNN algorithm integrating attention mechanism (AM‐Mask R‐CNN) for airport pavement damage is proposed. First, the AM‐Mask R‐CNN is developed based on an improved Mask R‐CNN modified by the feature pyramid (FPN) and the dual attention mechanism are fused to extract subtle features in images. In addition, feature fusion is used. The macroscopic and microscopic features of airport pavement damage are organically combined to improve the segmentation sensitivity of small target areas and dark lighting. The experimental results show that the average F1‐score of the proposed model is 0.9489. In addition, the mean intersection over union for the model proposed is 0.9388. The average segmentation speed can reach 11.8 FPS. Moreover, compared with the traditional threshold segmentation method Fully Convolutional Networks (FCN) and DeepCrack segmentation method, the effectiveness of the proposed method is further proved.

pavement are important factors that threaten the health of the pavement. 2,3 Cracks and potholes are common apparent damages to airport pavements, which affect the normal take-off and landing of aircraft. In addition, these damages can easily develop into plate fractures and pavement shedding. 4 The FOD is easily sucked into the aircraft engine or damages the tires, causing great harm to the aircraft, 5 as shown in Figure 1. Therefore, according to the "Civil Airport Operation Safety Management Regulations," pavement inspections need to be carried out by airport maintenance personnel every night. This allows for early detection and resolution of pavement damage to prevent accidents. If the location of pavement damage can be quickly found and repaired in time, further deterioration of the pavement structure can be prevented. In addition, the accident rate can be greatly reduced, and pavement maintenance costs can be reduced.
At present, domestic and foreign pavement detection methods mainly include: manual inspection, digital image detection technology, and detection methods based on deep learning. Manual inspection is a common method in pavement damage detection, which mainly relies on the inspection of the airport pavement by the staff to determine the pavement damage information. Due to the variety and quantity of pavement damage, the detection work is time-consuming and laborious. In addition, it is susceptible to the subjective influence of the detection personnel. 6 The detection ability and experience of the inspectors will seriously affect the results of pavement damage detection.
With the development of scientific research, the methods based on image processing and computer vision provide more effective means for the detection of pavement damage. 7,8 At present, common image processing methods include edge detection, threshold detection and so on. Li et al. used the improved Otsu threshold algorithm to achieve the segmentation of the apparent damage of the airport. 9 However, this algorithm will cause false detection of nondamaged areas such as wheel marks and oil stains in the actual scene. Amhaz et al. proposed a shortest path selection algorithm based on the apparent damage to the connectivity between pixels at the airport. 10 The algorithm has the advantage of strong robustness, but the detection effect is obviously affected by the choice of parameters. Shi et al. proposed the Crack-Forest algorithm. 10 The algorithm uses random structural forests to learn the structural features of cracks and generates an apparent damage detector for airport pavement that can identify complex topological features. Ayenu-Prah et al. proposed a method to detect pavement cracks based on two-dimensional empirical mode decomposition (BEMD) and Sobel edge detector. 11 The method uses BEMD to remove noise from pavement images. Then, the Sobel edge detector is used to detect cracks, thereby improving the accuracy of crack detection. However, this method is not effective for image detection with low contrast between the background area and the target area.
Machine learning provides an important method for pavement damage detection, mainly including support vector machines, 12,13 neural networks, 14 and so on. Correia et al. proposed a pavement crack detection algorithm based on support vector machine. 15 First, preprocess the image. Then, use the support vector machine method to detect whether the image contains cracks. This method effectively extracts the fractured region. However, this method has high requirements on image quality and cannot effectively extract the crack contour of the blurred image. Akagic et al. proposed an unsupervised crack detection method for asphalt pavement based on gray histogram and Ostu threshold method, and achieved satisfactory performance in the case of low signal-to-noise ratio. 16 In recent years, with the development of artificial intelligence technology, deep learning has been widely used in aerospace, industrial production, engineering structure, and other fields. [17][18][19][20][21] Crucially, deep learning provides a fast and efficient method for pavement damage detection. [22][23][24][25] Cao et al. proposed a seam crack detection method based on VGG16 deep convolutional neural network. 26 Ali et al. proposed an autonomous unmanned aerial vehicle system integrated with F I G U R E 1 Example of a flight incident caused by runway debris damage an improved region-based convolutional neural network (faster R-CNN) to identify various types of structural damage. 27 Mandal et al. proposed an automatic pavement damage analysis system based on the YOLO v2 deep learning framework. 28 Jiaxiu Dong et al. proposed a pavement crack segmentation method based on Mask R-CNN, which achieved effective segmentation of pavement cracks. 29 Kang et al. proposed a new semantic transformer representation network to achieve real-time segmentation of pixel-level cracks in complex scenes. 30 Yang et al. proposed a crack detection model of feature pyramid network (FPN), 31 which fuses information from multiple convolutional layers to achieve fast segmentation of cracks. Zou et al. constructed an encoder-decoder-based airport damage detection model. 32 First, by fusing the damage features at the same scale of the decoder and encoder. Then, the damage information of each scale is feature fusion to realize the effective combination of different scale feature information. Peigen Li et al. proposed a Combining Semantic Segmentation and Edge Detection Model to achieve effective detection of pavement cracks. 33 Guo Xiaojing et al. proposed an improved yolo v3 algorithm. 34 The algorithm adopts Darknet-49 with relatively lower complexity as the feature extraction network. In addition, feature fusion is performed to improve the FOD detection accuracy of the airport pavement. Jiang Jin proposed the identification and localization technology of foreign objects on airport pavement based on Faster-RCNN. 35 The Faster-RCNN algorithm framework is used for foreign body-type identification.
However, the above methods cannot effectively detect small targets and dim images. The Mask R-CNN algorithm fused with attention mechanism (AM-Mask R-CNN) is proposed in this study to achieve efficient and accurate segmentation of airport pavement damage. First, the damage feature map is passed through the attention mechanism module and the FPN module in parallel structure, respectively. The attention mechanism can improve the neural network's ability to understand the image, which is beneficial to extract the subtle features in the image. In addition, the macroscopic and microscopic features of the target are organically combined to improve the sensitivity of small object detection and the accuracy of small target object segmentation. Second, the RPN network provides the classification network with regions of interest (RoI) in the image. Determine which areas in the inspection image will have damaged areas such as cracks, potholes, and FOD. Furthermore, the position of the RoI box is fine-tuned in this network. Therefore, the extracted damage area is more accurate, and the segmentation accuracy of the target area is improved.

METHODOLOGY
The Mask R-CNN algorithm is proposed by He Kaiming et al. to achieve object segmentation. 26 Based on the Faster R-CNN algorithm, a parallel target mask branch is added to obtain the Mask R-CNN, which can achieve more detailed segmentation of the detected target. This method can effectively detect objects and complete high-quality instance segmentation. This method is the most widely used deep learning instance segmentation algorithm. In this study, in order to achieve effective segmentation of low-light conditions and small target damage, an attention mechanism is integrated in the FPN network. The network is shown in Figure 2. The Mask R-CNN with attention mechanism (AM-Mask RCNN) is mainly divided into the following parts: Attention Module-FPN (AM-FPN) with attention mechanism, Region Proposal Networks (RPN), ROIAlign, classification network and mask segmentation network. The multilayer features generated by different convolutional networks are not simply added together. In order to achieve efficient utilization of target features at different scales, FPN adopts the method of bottom-up and horizontal connection intersection to fuse different feature maps. The attention mechanism can improve the neural network's ability to understand the image, which is beneficial to extract the subtle features in the image. In order to solve the problem of missed detection of small damage by other segmentation algorithms, an attention mechanism is integrated into the FPN network. The macroscopic and microscopic features of the target are organically combined to improve the sensitivity of small object detection.

FPN fused with attention mechanism
After image feature extraction, the feature map will pass through the parallel structure of attention mechanism module and feature pyramid network module, respectively. Then, pixel-wise addition of the feature layers output by the two modules is performed. Next, candidate boxes are generated by RPN. Finally, the final result is obtained by segmentation network. The attention module of the parallel structure consists of two branches, the upper branch is the Position Attention Module (PAM) branch, and the lower branch is the Channel Attention Module (CAM) branch. The PAM module extracts class-based correlations from feature maps. The CAM module improves the feature reconstruction process by weighting the class channels to obtain better contextual representation.
The position attention module (PAM), as shown in Figure 4. Given a feature A j , two new features B j , C j are obtained through a convolution operation with a BN layer and a Relu layer. Then, the two features are reshaped. Next, the softmax layer is applied to calculate the position attention map, and the specific calculation formula is shown in Equation (1).
where S ij indicates the influence of the ith position on the jth position, the more similar the features of the two positions the greater the influence on this value. B i denotes the B j transpose. The feature A j is also fed into a convolutional layer with BN and ReLU layers to produce another feature map D j . The final output of the location attention model is shown in Equation (2).
In the above equation, E j is the final output, s ji and D i denote the transpose of S ji and D j respectively, is the influence factor, A j is the input feature, and D j is the generated intermediate vector.
Similarly, the channel attention module, shown in Figure 5, first needs to calculate the effect of the ith position on the jth channel, as shown in Equation (3). The final output of the attention module is then calculated, as shown in Equation (4).
In the above equation, x ji represents the influence of the ith channel on the jth channel, A j is the input feature, A i is the A j transpose and is the weighting factor.

Region proposal networks
RPNs 37 are responsible for providing the classification network with ROI in the image, that is, where cracks, potholes and clutter are to be detected, within a preselected box generated according to established rules, using trained parameters. The RPN network is shown in Figure 6. As shown in Figure 6, the RPN network can be equated to a sliding window to find the target region. The RPN generates a fully connected feature of length 256 (corresponding to a ZF network) for the feature map of the last convolutional layer (conv5-3). The fully connected features are sent into two fully connected layers, one of which is the classification layer, which is used to judge whether the candidate region is foreground or background, and filter out the background candidate region. The other is the regression layer, which is used to adjust the position of the candidate region corresponding to the anchor point in the center of the prediction candidate box.

ROIAlign
ROIAlign is a layer that generates a fixed-size feature map from the RPN output that may contain rectangular candidate regions of different sizes of the target for subsequent classification and envelope regression operations. ROIAlign

Feature map
Reshape Transpose Principle of the region proposal networks regional proposal network optimizes the problem of misalignment between the original image and the feature image pixels caused by the ROI Pooling operation in the Faster R-CNN by using a bilinear interpolation method to obtain the image values at coordinates of The number and position of these sampling points will not have a great impact on the performance, while making the discontinuous process into a continuous one, and the whole process does not introduce errors, which not only improves detection accuracy and is more conducive to instance segmentation.

Classification networks and masked segmentation networks
The role of the classification network is to determine the type of target within the ROI and also to finely tune the position of the ROI to more closely match the object in the image. The classification network uses a CNN; the role of the mask network is to classify each pixel in the image based on the location and classification of the ROI output from the classification network, deciding whether each pixel belongs to a crack, pit, clutter or background, etc. The main use in the mask network is a full CNN to achieve the splitting of pixels and thus the segmentation of the target damaged area.

Multitasking loss functions
The training phase of the deep learning model is to optimize the parameters of the deep learning network by learning from the annotated images so that it can fit the feature distribution of the detection target. The optimization function in the training process is the loss function of the deep learning model, which is defined as shown in Equation (5): In the above equation, L RPN is the position error between the prediction box and the real box (the real position of the target) in the PRN layer, L clss is the prediction error in the classification network for determining the target category in RoI, L reg is the position error between the prediction box and the real box in the classification network, and L Mask is the mask segmentation error in the mask segmentation network.

Dataset sources
This paper investigates the effectiveness of Mask R-CNN in detecting airport road surface damages under low light conditions, using the road surface of Zhengzhou Xinzheng International Airport as an example. The proposed model meets the diversity of pavement damage detection under low light conditions. Based on a COCO dataset acquired by the Microsoft team that can be used for image segmentation and target detection, the COCO dataset was modeled to create a segmentation dataset adapted to the airport road surface damage example for this experiment. High-resolution cameras were used to capture images of airport pavement distress. These high-resolution images include a variety of road surface damage patterns, including main runways, fast runways, cracks, potholes and FOD on the shoulder of the runway, traversing the main types of runway damage and adjusting lighting and Angle conditions to simulate a variety of complex environments. In order to prevent the model from over-fitting the damage, the damage pictures with similar characteristics were removed from the samples, and 1300 pictures with high quality and obvious characteristics were selected as the initial data, including 507 road surface cracks, 438 road surface pits and 355 road surface debris.

Dataset preprocessing
As the captured images may have different pixel sizes during the data acquisition process, the captured images are statistically cropped to 512 × 512 pixels in this study in order to facilitate the training of deep learning models. In addition, image acquisition may suffer from noise such as image dimness, so the acquired images need to be denoised. In this study, a Gaussian filter is used to smoothly denoise the images and eliminate the noise generated during the capture process.

GAN-based data enhancement
It is well known that deep learning requires the support of big data. However, it is difficult to collect high-quality large-scale data in real-world airport pavement. This is because the more severe the damage, the more difficult it is to collect data. For example, for airport pavement damage, given its severe impact, larger cracks and debris are repaired and cleaned up as soon as they are found. It is difficult to collect images of the aforementioned airport road surface damage. Therefore, data enhancement is required. Common data enhancement methods include flip transform, translation transform, and scale transform, which can effectively increase the amount of data available for model training. However, these enhanced images inherently have a similar feature distribution to the original image. As a result, the resulting performance improvements are limited. However, with the development of deep learning, it provides an effective way for data enhancement. Numerous studies have demonstrated that Generative Adversatial Networks (GAN) can be applied to the field of data enhancement and that the method generates images that are realistic and infinitely close to the effects of real images. And these images are completely new samples, completely different from the images in the original dataset. So they fill in the distribution of real images not covered by the original dataset. This enables effective expansion of the dataset and solves the problem of uneven distribution of images. The performance of the deep learning model can thus be significantly improved. The GAN data enhancement model proposed in this study is shown in Figure 7.
The method is an unsupervised deep learning model. There are two modules in the generative adversarial network framework: the generative network (G, Generator) and the discriminator network (D, Discriminator). By masking z and F I G U R E 7 GAN-based data enhancement model random noise, the generative network obtains a batch of false images. Mixing real and fake images as input, the Discriminator network is used to determine whether an image is a real image or not, and outputs the probability that each image is a real image. The output is 1, which means that the image is a real image, and 0, which means that the image is a false image.
In the training process, the goal of the generative network is to generate realistic false images to deceive the discriminator network; the goal of the discriminator network is to distinguish the real images from the generated false images. The generating network and the discriminating network form a dynamic game process. In the best case, the generative network can generate false images that are sufficiently "realistic"; the discriminative network has difficulty in determining whether the images generated by the generative network are real or not. This results in a generative model G, which is used to generate realistic false images. An example of a generated image of an airport pavement surface is shown in Figure 8.
The Frechet inception distance (FID) indicator is a more comprehensive and widely used guide for evaluating generated images. 38 FID represents the Frechet distance between multidimensional Gaussian distributions. In addition, FID is primarily used to measure the overall authenticity and variety of the generated images. The calculation of FID is shown in Equation (6). 39 FID is calculated in this study. The value of FID is 34.78%.
In the above equation, m r represents the feature mean of the airport pavement damage real image, m s represents the feature mean of the airport pavement damage synthesized image, cov r represents the covariance of the airport pavement damage real feature image, and cov s represents the covariance of the airport pavement damage synthesized feature image.

Dataset construction
Based on the results of GAN data enhancement, this study constructed an airport pavement damage data set. The dataset was then manually annotated using LabelMe software, and an example of the annotation is shown in Figure 9. The annotated dataset was randomly divided into a training set, a validation set and a test set in the ratio of 4:1:1, as shown in Table 1. The training set is mainly used for training the Mask R-CNN model with fused attention mechanism, the validation set is mainly used for model verification and model hyperparameter tuning, and the test set is used for testing the robustness and generalization ability of the model.

Experimental environment
The experimental procedure was all performed on the same server with the main configuration of Inter(R) Core (TM) i7-8750H CPU@2.20GHz on ubantu18.04 operating system, running with 1 GeForce GTX 2080Ti (GPU) image processor, driving CUDA8.0 and CUDNN 6.0.0 optimized for neural network computation. Tensorflow and Keras are used as deep learning frameworks. All experimental procedures are implemented using python 3.6 language programming.

Evaluation indicators
In order to evaluate the effectiveness of damage detection of the trained model, this paper uses the performance index of target detection to evaluate the effectiveness of damage detection, which includes accuracy (precision), recall (recall), F1-score and mean intersection over union (mIoU). This paper defines three parameters TP (the number of samples that are actually positive cases and classified as positive cases by the classifier), FP (the number of samples that are actually negative cases and classified as positive cases by the classifier), FN (the number of samples that are actually positive cases and classified as negative cases by the classifier). When the threshold value of IoU (the overlap rate between the target candidate frame and the original marker frame) is greater than 0.5, the result of the damage detection is determined as TP.
The accuracy rate is the ratio of the number of actual positive samples in the predicted samples to the number of samples detected as positive by the model, as shown in Equation (7).
Recall is the ratio of the number of actual positive samples in the predicted sample to the number of positive samples in the test set, as shown in equation (8).
The F1-score allows a comprehensive evaluation of the performance metrics of the model, which is a measure of the classifier's performance by calculating the summed average of precision and recall. Its calculation formula is shown in Equation (9).
The mIoU is commonly used in the evaluation of image detection. 27,40 the mIoU represents the overlapping average accuracy. The calculation formula is shown in Equation (10). The n represents the total number of categories, and in this study, n is 3.

Hyperparameters set
The deep learning model must set the initial learning rate, weight decay and momentum coefficient hyperparameters. Weight decay can effectively prevent the model from overfitting, and momentum coefficient can improve the efficiency of model learning. The learning rate controls the rate of gradient descent and affects the speed of model training and learning. If the learning rate is too large, the loss value will increase and the model will not converge. Conversely, if it is too small, the loss value is almost constant and the model learns too slowly. When the model starts training, a large learning rate should be set so that the model loss decreases rapidly. As the model converges, the value should be decayed to gradually achieve global minimization. In this paper the initial learning rate is decayed by a factor of 10 every 2000 iterations. Therefore, in order to determine the optimal combination of hyperparameters, this paper compares multiple sets of model hyperparameters through several comparison experiments. The experimental results of different hyperparameter combinations are shown in Table 2. Therefore, we choose the optimal combination of hyperparameters with initial learning rate, Weight decay and momentum coefficient of 0.001, 0.95, and 0.00035, respectively.

Effect of the number of iterations on the model
The number of iterations has a significant impact on the accuracy and learning time of the training model. If the number of iterations is low, although the training time is shortened, there is a risk that the training loss cannot be minimized, resulting in low accuracy. If the number of iterations is high, the training loss can be minimized; however, the training time is too long, leading to unnecessary waste of resources. Therefore, choosing the right number of iterations is essential for the training of deep learning models. In order to determine the optimal number of iterations, the maximum number of iterations set in this paper is 12,000 and the loss of training is output and the decreasing loss function curve plotted is shown in Figure 10. As can be seen from Figure 1, the model converges when the number of iterations reaches 9500.

F I G U R E 10 Model training loss curve
The model was then saved every 200 iterations and the model performance was validated on a validation set not used for model training, and the F1-score curves are shown in Figure 11. Table 3 shows the model validation results and training times for iterations of 1000, 3000, 5000, 8500, 9500, 11,000, and 12,000, respectively. From Table 3 and Figure 11, it can be seen that the F1-score curve shows an increasing trend for the number of iterations from 0 to 9500, and the F1-score curve is basically unchanged from 8500 to 9500, and there is a decreasing trend in the F1-score curve from 9500 to 12,000 iterations, which is due to the possible overfitting of the model. 8500 iterations have a smaller time than 9500 iterations. The iteration time for 8500 iterations is less than that for 9500 iterations, so 8500 is chosen as the optimal number of iterations in this study, where the F1-score can reach 0.9585, which has the lowest computational cost.

Impact of data enhancement on the model
The training of deep learning models requires the support of large data, and data augmentation is often required due to the small amount of data on airport road surface distress. In order to verify the effect of data augmentation on model training, this study validates the experimental effects of no data augmentation, common data augmentation methods (flip transform, translation transform, scale transform, etc.) and GAN-based data augmentation methods. Table 4 shows the segmentation F1-scores for all lesions, cracks, potholes and debris, from which it can be seen that the GAN data enhancement method has the best segmentation F1-score. The images generated by the GAN model are realistic and infinitely closer to the real images. These images are new samples, completely different from the images in the original dataset. Therefore, they fill in the real image distribution not covered by the original dataset, which can effectively expand the dataset and solve the problem of uneven image distribution. The segmentation performance of the model is thus significantly improved. Therefore, in this study, we choose the GAN-based data enhancement approach to effectively enhance the original dataset.

Model testing
To further show the training effectiveness of the model, this study tested the model using a test set of 1200 images that were not used for training and validation. Figure 12 shows examples of crack, pothole and foreign object debris results under low light conditions. The numbers in the figure indicate the probability that the identified area is that type of damage. Figure 12 shows that our proposed model is not only able to classify the damage, but also to find the exact outline of the damage. Accuracy (precision), recall (recall), F1-score and mIoU were calculated and the results are shown in Table 5.
As can be seen from In addition, the mIoU for the three damages are 0.9348, 0.9452, and 0.9387, respectively. The segmentation speed can reach 10.4FPS, 12.5FPS and 9.7FPS, respectively. We calculated the average segmentation time and it took an average of 1.34 s to segment an image, allowing for accurate and efficient segmentation of damages under low light conditions. This is because the network feeds valid RoIs into the final segmentation module, rather than feeding the whole image, which saves a lot of computation time. In particular, effective and fast segmentation can also be achieved for small targets such as clutter, due to the attention mechanism we have incorporated into the model, which effectively improves the sensitivity of small object detection.

Comparative model analysis
In order to compare the segmentation performance of the proposed method with other methods in this study, the traditional Canny edge detection algorithm, 11 Fully Convolutional Networks (FCN), DeepCrack 32 and the proposed model in this study were compared in experiments. The four models were trained on the same dataset and experimental environment, and tested on the test set. The experimental segmentation results are shown in Figure 13. Figure 13A-C shows F I G U R E 13 Example test images: (A) crack, (B) pothole, (C) foreign objects debris the segmentation results of cracks, potholes and FOD respectively. From Figure 13A, it can be seen that the traditional edge detection algorithm can no longer fully identify the crack region, FCN and DeepCrack cannot effectively identify the boundary of the crack region, and the model proposed in this study can achieve effective segmentation. Figure 13B shows the segmentation effect of potholes, and the results of the traditional edge detection algorithm and FCN are similar to those of the model proposed in this study. However, the predicted boundary of the method in this study is closer to the real one. However, the DeepCrack model has error detection. Figure 13C shows a comparison of the segmentation results for small targets with foreign objects and fragments, where this method has a more effective segmentation effect. The edge detection algorithm incorrectly identifies the background region as a clutter target region. This is because conventional Canny edge detection algorithms 11 are not very robust to noise and may miss or mis-detect images with low contrast between the background region and the target region. The model in this paper incorporates an attention mechanism in the FPN network, which organically combines the macro and micro features of the target and improves the sensitivity of small target detection. In addition the model uses a nonmaximum suppression algorithm to retain a more accurate RoIs feature map of the target region, and therefore the segmentation effect is due to the FPN. In order to compare the above four models more clearly, this study calculated the accuracy (precision), recall (recall), F1-score, mIoU, and average speed of the four models tested, and the results are shown in Table 6. The accuracy (precision), recall (recall) and F1-score of the model proposed in this study are better than the other three methods, with values of 0.9436, 0.9578, 0.9489, and 0.9067, respectively. In addition, the mIoU for the model proposed is 0.9388. The segmentation speed can reach 11.8FPS, 3.7FPS, 8.9FPS, and 9.3FPS, respectively. This is because the edge detection algorithm requires manual extraction of features and cannot be automated. FCN requires features from multiple whole images to be processed, whereas this network inputs valid RoIs into the final segmentation module instead of feeding the whole image, which saves a lot of computation time. As a result, the time cost required by the model is much lower. In summary, the models proposed in this study all have better segmentation accuracy and efficiency, and can achieve accurate and fast segmentation of airport road surface damages.

CONCLUSION
A segmentation model of airport pavement cracks, potholes and FOD incorporating attention mechanism was proposed in this study. The FPN was fused with a dual-channel attention mechanism, and the position attention module extracts class-based correlations from the feature maps. The channel attention module improves the feature reconstruction process by weighting the class channels. Therefore, a better contextual representation was obtained. The subtle features in the image are extracted, which in turn improves the accuracy of segmentation. It was difficult to collect airport pavement damage data. In this study, based on the collection of some airport pavement damage images and preprocessing, the GAN network was used to achieve effective data enhancement. Next, the labelme software was used to implement data labeling, thereby constructing a dataset for training, validating, and testing the proposed segmentation model.
For the optimal training model to be selected, the optimal combination of hyperparameters is first determined through multiple comparative experiments. The number of iterations, initial learning rate, weight decay and momentum coefficient are 8500, 0.001, 0.95, and 0.00035, respectively. Secondly, the verification results of no data enhancement, ordinary data enhancement methods (flip transformation, translation transformation, scale transformation, etc.) and GAN-based data enhancement methods are compared. The effectiveness of GAN data augmentation is demonstrated, and the F1-score on the validation set can reach 0.9485. Then, the trained model was tested on the test set. The experimental results show that the proposed model can effectively achieve the segmentation of small target areas and dim images, and the minimum F1-score can reach 0.9445. The mIoU for the model proposed is 0.9388. In addition, in order to further verify the effectiveness of the model, comparison experiments with threshold segmentation, FCN and DeepCrack models are carried out. The results show that the proposed model has more excellent segmentation performance, which proves the effectiveness of the model.
There are other types of failures that affect airport pavement safety. In future work, we will optimize our model to detect more types of failures. In addition, the size information of the damage will be measured in the future to provide more effective information for airport pavement maintenance agencies.