A blind contour-aware quality model for sonar images

Since the penetration depth of visible light is limited in muddy and dark deep marine environment, sonar imaging, which is independent of natural light, has been attracting much attention in underwater detections. However, the visual quality of sonar image is inevitably damaged during the acquisition and the transmission in the sophisticated and dynamic underwater environment. The quality decrease can further bring negative impacts on oceanic information analysis. To measure the quality decrease of sonar images, this paper proposes a contour-aware model for sonar image quality assessment (CaMSIQA). The CaMSIQA metric is established upon the conclusion that the quality of sonar images can be deﬁned as a measure of their utilities in real application scenarios. We exploit a contour information statistic (CIS) model that is critical to object recognition of sonar images here. Quantifying the changes of the CIS model between the unimpaired and test sonar images makes it possible to perceive the quality decrease of sonar images. Extensive experimental results have demonstrated the effectiveness of the proposed method compared with current mainstream image quality assessment (IQA) methods without full reference information.


INTRODUCTION
As the emergence of ocean explorations in recent years, underwater detection has got increasing attention. The object imaging in muddy and dark deep marine environments is generally achieved via sonar devices since they require no natural light for imaging. A complete sonar system would take a series of processing steps, that is, image acquisition, compression and transmission, each of which often leads to distortions in consequence of the limitations of bandwidth and the deterioration in the underwater acoustic channels. The ability to estimate the impacts of these impairments on sonar image quality is a significant problem in underwater detection tasks. Therefore, reliably exploiting sonar image quality assessment (SIQA) methods for monitoring the wide field of image processing procedures is of high practical relevance. Due to the insufficient capabilities of current image processing, human intervention is still required in underwater detection. Definitely, numerous objective image quality assessment (IQA) approaches designed from human-oriented perspectives have been proposed for optical images. Presently, IQA algorithms could be commonly categorized into three types. The first type This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology is a statistic-based approach. These algorithms outline the image intrinsic characteristics by utilizing a statistic model from different aspects. The representative work of this type includes natural scene statistics (NSS) model [1,2], information and structure models [3,4], as so on. The second type predicts perceptual quality by modelling the characteristics of the human visual system (HVS) from psychological vision science. These approaches quantify the visual attributes in image understanding and appreciation by making assumption of the HVS's behaviour [5][6][7][8].
The third type trains a deep neural network to predict image quality based on a corpus of high-level features [9][10][11].
Though a large set of IQA methods has been developed for optical images, their performance indices are far less than the ideal results in SIQA. The reasons are threefold: (1) Sonar images have different statistical characteristics from optical images. To be specific, the optical images display rich colour variations, complex texture content and thick lines, while sonar images are greyscale with low mean value and bimodal histogram. For more intuitive comparison, an example of a sonar image compared with an optical image is shown in Figure 1. Moreover, images obtained by distinct sonar devices have different characteristics. Therefore, the SIQA is equipment specific (3) Most optical IQA algorithms are general purpose and visual perception oriented whereas the SIQA should be task oriented since the evaluation criteria of sonar image quality is application specific. We tend to assess the utility quality of sonar images under specific applications which cannot be substituted by perceptual quality [12,13].
We define the image utility quality as a practicability indicator of the test image in specific applications. Some deviceand operation-based parameters have been extracted for synthetic aperture sonar (SAS) image utility quality assessment. The SAS image that performs well at adapting the route of a sonar-equipped autonomous underwater vehicle is considered to have sufficient quality. In this case, the measurements of sonar-platform motion, environmental characteristics and the degree of navigation errors have been employed to represent the utility quality of SAS images [14,15]. In [16], some low-level features are correlated with the utility quality of forward looking sonar (FLS) images. The extracted features are the indicators of obstacle recognition performance of the test image. In these works, the utility quality of a sonar image with low precision and resolution is predicted based upon the performance of relevant underwater equipment in specific applications. Nevertheless, an increasing number of sonar images require human decision in underwater detection due to the appearance of high-precision sonar imaging technologies in complex detection tasks. For instance, the advanced acoustic lens and sidescan sonar devices are commonly used in underwater detection in recent years because of their sufficient precision and resolution [17]. In human-involved underwater detection, sonar images with clear and visually recognizable object are considered as high quality. According to this principle, we have proposed a series of SIQA methods in [18][19][20]. These methods provide predicted utility quality of acoustic lens and side-scan sonar images in respect of visual characteristics of object detection.
Despite the aforementioned efforts, SIQA is still a largely underexplored field compared with optical IQA. The important and challenging issues remained in SIQA can be divided into three categories. First, in order to adapt to the requirements of complex detection, the utility quality should be evaluated in consideration of HVS. However, existing SIQA approaches for low-resolution images only take service indicators into account. Second, due to the limited bandwidth of underwater acoustic channels, the transmission burden should be as less as possible. Based upon the accessibility of the reference information, IQA algorithms can be classified into three groups. The fullreference (FR) [21,22], no-reference (NR) [23,24] and reducedreference (RR) [25,26] methods predict quality of test image by comparing relevant features with the complete, no and partial reference information, respectively. The state-of-the-art FR and RR SIQA methods [19,20] need at least 1300 bits to represent the reference information, which is a hefty burden on underwater acoustic channels. Third, the training process of the SIQA methods should be lightweight and efficient to the greatest extent on account of the lack of sonar image materials and the real-time demands in many underwater scenarios. The stateof-the-art NR SIQA metric [18] trains and aggregates multiple base learners for overfitting avoidance, the computational complexity of which is many times higher than that utilizing only one learner.
To address this problem, we propose an NR contour-aware model for sonar image quality assessment (CaMSIQA) for acoustic lens and side-scan sonar devices. The major contributions of this paper are summarized in the following three aspects: 1) Proposing an SIQA method that excludes any reference in the pipeline. This is significantly practical in underwater acoustic transmission with limited channel bandwidth. 2) Proposing a contour information statistics (CIS) model to depict the statistics of the main morphology of objects in a sonar image. The CIS model is motivated by the requirements of common application scenarios of sonar images, that is, underwater detection. 3) Quantifying the differences between the parameters of CIS model of distorted and unimpaired sonar images by metric learning. It is a good trade-off between accuracy and computational complexity in contrast to conventional learning machines.
The overall structure of this paper is divided into four parts, including this introduction. The details about the CIS model as well as the metric learning approach are presented in Section 2. Section 3 provides the quantitative and qualitative performance comparisons between the CaMSIQA method and the existing state-of-the-art RR and NR IQA methods. In Section 4, this study is concluded.

METHODOLOGY
In this section, the details of the CaMSIQA method are expounded. The method is proposed based on a corpus of contour-aware features using the conclusion that image contour is significant for underwater detection [18]. Then we find that the statistical regularities of exploited contour-aware features can be observed by fit. Therefore, the CIS model is derived from the fit of the contour-aware features. The benchmark of the CIS model is established from a collect of undistorted sonar images so that the CaMSIQA can work without unimpaired reference. We also estimate CIS model parameters from the test sonar image. The distance between parameters of these two models are utilized as the indicator of utility quality of the test image. To make distance measurement more accurate, we exploit a supervised metric learning algorithm consisting of dimensionality reduction and Euclidean distance measurement.

Contour-aware features
The importance of structural information of the HVS for semantic objects understanding has been verified in [13].
Although the structure-based IQA has been well researched, existing structure-relevant works perform unsatisfactorily on sonar images. The reasons can be concluded as the indiscrimination of the global and local structures. Specifically, the global structure dominates the object description, while local structure mostly affects the comfort of an image. Thus, instead of taking account of all components of image structures in the SIQA, extracting global structure-based features is a preferable alternate. We extract contour-aware features, which is one of the primary representations of global structure, in accordance with feature extraction framework proposed in [18].
To collect contour information from multi-aspects, the sonar image is firstly transformed into frequency and spatial domain. For frequency domain, the discrete cosine transformation (DCT) and the Cohen-Daubechies-Feauveau 9/7 (C-D-F 9/7) wavelet transform are employed for fixed-and multi-scale analysis. The coefficient matrices of the DCT and the C-D-F 9/7 wavelet transform are denoted as C D and C W . Meanwhile, the singular value decomposition (SVD) is utilized to depict contour information in spatial domain. We denote C S as the diagonal matrix of the SVD transform. According to [22] and [27], the elements of C D , C W and C S with large values are concise representatives of image contour. Thus, we define the sparsity of these coefficients matrices as the indicator of image contour. Since Gini index and Hoyer measure have varying levels of sensitivity to sparsity, that is, the sensitivity of these two sparsity measures to contour distortion is different [18]. Both the Gini index and Hoyer measure, which outperform other sparsity measures [28], are utilized to more comprehensively describe the contour information. The Gini index and Hoyer measure are defined in Equations (1) and (2):

CIS model
In this part, we try to capture the inherent properties of extracted features for a universal and concise expression. For optical images, some statistical models have been proposed to capture the statistical behaviour of image features [29,30]. For example, given a set of natural scene images (NSIs), the statistics of their mean subtracted contrast normalized (MSCN) coefficients can be characterized by an asymmetric generalized gaussian distribution (AGGD) model. Moreover, it has been proved to be held in [31] that the MSCN coefficients have characteristic statistical properties changed by the presence of distortion. Then the NSS model, which captures the aforementioned statistical properties, is established based upon this hypothesis. Since sonar images are visually and practically different from conventional NSIs, the NSS model is inferior for sonar images. Nevertheless, inspired by this pipeline, we also discover the statistical behaviour of the extracted contour-aware features from sonar images. In order to visualize contour-aware features, Figure 2 Figure 4, we can find that though quality decrease modifies the statistics of contour-aware features characteristically, the statistical behaviour of distorted contour-aware features can still be well approximated by distribution. It is clear that the peak and the tail behaviours of functions are distinguishable relevant to different quality decreases. From this  distributions of contour-aware features extracted from sonar images with different quality decreases observation, the model is given by: where x denotes the contour-aware features. The 'shape' of function is controlled by a and b in Equation (1), which can be estimated by maximizing a modified likelihood that integrates the zeros or ones by considering them as if they were values that have been left-censored at the square root of the smallest sample or right-censored at 1 − ( is a small content), respectively [32]. B(⋅) is expressed as: with Γ(⋅) defined as The mean and the variance of the distribution are also useful, which are defined as Equations (6) and (7): By deploying aforementioned parametric model to fit the contour-aware empirical distributions of sonar images, we can obtain a model including the image contour information, which is dubbed as CIS model here. The estimated parameters along with the corresponding mean and variance form a paramater tuple of the relevant CIS model. The parameter tuple (a, b, , ) is utilized as the depiction of the fitted model. A general framework of the establishment of CIS model is shown in Figure 5.

Metric learning
To measure the distance between the different CIS parameter tuples, a supervised metric learning algorithm is employed for the CaMSIQA method, which is characterized by marginal Fisher analysis (MFA) [33] and Euclidean distance. The MFA is a dimensionality reduction algorithm which overcomes the limitation of linear discriminant analysis algorithm that the samples of each class should obey a Gaussian distribution. According to the procedure of MFA, we divided the training sonar images into groups according to the MOS values. Images in the same group were assigned as the same index. The MFA can be stated as four steps.
In the first step, we obtain a transformation matrix W PCA by projecting the training data into the principal component analysis (PCA) subspace. Then the intraclass compactness and interclass separability graph are constructed by: where is the projection matrix and X is the vertex set of a graph whose Laplacian matrix is defined as D c − W c . W c i j = 1 if and only if the samples x i and x j is the k c th nearest between their neighbours, and D c is defined as The interclass separability can be characterized as: where W s i j = 1 if and only if x i or x j is the k s th nearest across different classes and D s is given as In the third step, the objective function is given as follows: Finally, the projection direction becomes: Thanks to the projection matrix , the parameter vector obtained from the CIS model can be projected into a new space and the distance between the parameter tuples of different CIS models can be measured using the Euclidean distance. The quality of the test sonar image is expressed as the distance between different CIS models: where p r and p d indicate the dimension-reduced parameter tuples extracted form the CIS model for undistorted and test sonar images, respectively.

Sonar image quality database
In [19], Sonar Image Quality Database (SIQD) was constructed for evaluating SIQA indices with adequate ground-truth human subjective data. Thus, all experiments here were conducted on the SIQD database. We tabulate the important information of the SIQD database in Table 1. The SIQD database includes 40 reference images captured by advanced acoustic lens sonar and side-scan sonar in underwater scenarios, with the resolution of 320 × 320. The objects of the selected sonar images include most common scenes for underwater detection, such as underwater creatures, swimmers, seabed and shipwrecks. Totally, there are 800 distorted images in the SIQD database. The distortion of sonar images is caused by the lossy compression, that is, Compression based on Gradient-Based Recovery (ComGBR) coding [34] and Set Partitioning In Hierarchical Trees (SPIHT) coding [35], and manmade bit errors within the bitstreams. The bit error ratios were set in accordance to the latest achievements in underwater acoustic communications [36,37]. Each type of distortions has four to five distortion levels. Figure 6 shows examples of four types of distortions. As illustrated in Figure 6, the compression coding causes blurs, while the bit errors introduce noise and eccentricity into the sonar image, respectively. All of the aforementioned types of distortions take typical peculiarities of the HVS into account, and are common in practical underwater application scenarios. In subjective tests, all images were divided into 20 groups, each of which was tested by 25 viewers with expertise in underwater target recognition. The Single Stimulus with Multiple Repetition (SSMR) method with a five-category discrete scale were employed to collect subjective scores. The collected scores were firstly converted to Z-scores. Then we rejected scores from the unreliable viewers according to the procedure specified in ITU-R BT 500-11 [38]. After viewer rejection, Z-scores were linearly rescaled to lie in the range of [0, 100]. Finally, the mean of the rescaled Z-scores was computed as the MOS value of each test image. In this way, all images in the SIQD database are finally labelled by their MOS values.

Evaluation protocols
To reveal the effectiveness of proposed approach, comprehensive experiments were conducted on four popular IQA benchmarks. Selected IQA benchmarks can be classified into three groups according to their physical meanings. The first group includes the Spearman rank-order correlation coefficient (SROCC) and the Kendall rank-order correlation coefficient (KROCC) for prediction monotonicity measurement defined as Equations (15) and (16): where d indicates the discrepancy between the rank of MOS value and predicted objective quality for the ith sonar image. M , M d and M c represent the number of test sonar images, the concordant and the disconcordant sonar image pairs in the testing set, respectively. These indices operate only on the ranks of the sample points without considering the relative distance between sample points. The second group of IQA benchmarks is relevant to the prediction accuracy. The prediction accuracy is indicated by the Pearson linear correlation coefficient (PLCC) between MOS values and the mapped objective scores Q m , that is, where Q m i andQ m are the mapped objective score for the ith sonar image and the mean of all Q m i , s i ands represent the MOS value of the ith sonar image and the mean of all s i , respectively. The mapped objective score can be obtained as: where Q p and Q m are the predicted objective score from the corresponding IQA index and the mapped objective scores after the non-linear regression, respectively. i , i = 1, 2, … , 5 are parameters to be fitted during the curve fitting process. The IQA index in the third group is the root mean square error (RMSE) as depicted in Equation (19), which quantifies the differences between Q m and MOS values for prediction consis-tency measurement.
Based on the abovementioned four benchmarks, we compared the performance of our method with the state-of-theart NR and RR IQA methods proposed for NSIs and screen content images (SCIs). Besides, SIQA methods for side-scan and acoustic lens sonar images were also compared with the CaMSIQA method. We implemented them through the source codes provided by the authors and adopted their default settings if feasible. The 11 selected NR IQA methods are BLind Image Integrity Notator using DCT Statistics II (BLIINDS II) [2], Accelerated Screen Image Quality Evaluator (ASIQE) [23], Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE) [31], No-reference Free Energy based Robust Metric (NFERM) [39], AR-based Image Sharpness Metric (ARISM) [40], Blind Quality Measure for Screen content images (BQMS) [41], Integrated Local Natural Image Quality Evaluator (IL-NIQE) [42], Blind Pseudo Reference Image based quality measure (BPRI) [43], blind IQA method based on High Order Statistics Aggregation (HOSA) [44], Six-Step BLInd Metric (SISBLIM) [45] and NF IQA in Curvelet domain (CurveletQA) [46]. Three widely cited RR IQA methods have been compared with the proposed method, they are Reduced-reference Image Quality Metric for Contrast Change (RIQMC) [25], Reducedreference Wavelet-domain Quality Measure of Screen content images (RWQMS) [26] and Quality Metric Contrast (QMC) [47]. The SIQA methods chosen for comparison are No-Reference Contour Degradation Measurement (NRCDM) [18], FR Sonar Image Quality Predictor (SIQP) [19] and Partial-reference Sonar Image Quality Predictor (PSIQP) method [20].
Based on the previously mentioned IQA methods, we compared the performances of selected IQA methods on the SIQD database, wherein 4/5 is used for training and the rest is used for testing. There are 40 distorted sonar images in each group, every 20 of whom are of the same content. To exclude the performance bias, random choice of training and testing sets was repeated for 1000 times. The average performance across 1000 iterations is finally reported for comparisons.

Quantitative performance comparison
In this study, we utilize metric learning to measure the distance between the CIS model parameter tuples of the test and undistorted sonar images. Based on the metric learning, the model parameters are firstly projected into a new subspace for dimensionality reduction, wherein the transform matrix is optimized by the training samples. We show that the proposed method driven by metric learning significantly outperforms the conventional distance metrics. The results are tabulated in Table 2, wherein the better performances are highlighted in boldface. When metric learning is excluded from the framework, the Euclidean distance is directly applied between parameters of different models. From Table 2, it is obvious that applying metric learning leads to a performance gain, about 20% for the SROCC, 24% for the KROCC, 46% for the PLCC and 21% for the RMSE. The significant performance gain achieved by metric learning demonstrates the superiority due to its good separability for samples with different characteristics. Due to the pipeline of the CaMSIQA method, it works in an NR approach. We now compare the performance of the proposed method with that of other NR IQA methods on the testing set, where the images and the distortion types are completely separated from the training set. The results are reported in terms of the selected performance metrics in Table 3, where the best and the next-best performances are boldfaced and underlined, respectively. From Table 3, we can conclude that CaMSIQA performs better than all of the selected state-of-the-art NR IQA algorithms. It should be emphasized that only CaMSIQA method can attain performance greater than 0.7 for the SROCC and the PLCC, which are the most common performance metrics. With respect to the prediction consistency, the proposed method also achieves the best performance. Compared with the second-ranking method in Table 3, the performance gain of the CaMSIQA method is approximately 11% for the SROCC, 8% for the KROCC, 9% for the PLCC and 2% for the RMSE. Moreover, Table 4 tabulates the performance metrics of the CaMSIQA method as well as the selected widely cited RR IQA methods. We can see from Table 4 that the proposed method performs the best regarding to the prediction monotonicity among the SIQD database. As for the prediction accuracy and consistency, the RWQMS performs better than all the other competitors. The RWQMS method is an RR SIQA method, which needs 24 features as reference information. In practice, at least 480 bits are needed for accurate representation of the reference information. As for the proposed method, no reference is required, which is a very important improvement for underwater acoustic transmissions due to the limited bandwidth of underwater acoustic channels. Comprehensively considering Tables 3 and 4, the CaMSIQA beats all competing IQA methods. The performance gain achieved is due to the fact that our method mainly concerns the global structures of images which are directly related to the utilities of sonar images. In contrast, most common used IQA methods for NSIs or SCIs were designed either based on statistics or on visual characteristics.
We have also conducted the F-test between the proposed work and the compared IQA methods. We define the critical threshold F c according to the value of residuals of the compared IQA methods and confidence level which is assumed as 95%. Then the ratio of the prediction residual variances of objective qualities to MOS values is expressed as F t , where the objective qualities are regressed using the five-parameter logistic nonlinear regression function. When F t > F c , two IQA methods participating in the comparison can be regarded as significantly distinct. Table 5 tabulates the results of the significance comparison. In Table 5, "+1" represents that the CaMSIQA method is significantly better than the compared method, "0" shows that the performances of the CaMSIQA method and the compared method are statistically undistinguishable, while "−1" indicates that the CaMSIQA method is statistically worse than the compared method. It can be concluded that the CaMSIQA method

FIGURE 7
The performance comparisons of SIQA methods with CaMSIQA method is statistically better than the compared NR and RR IQA methods.
To compare the performance between SIQA methods, we exhibit performance metrics indicating the prediction monotonicity (the SROCC and the KROCC) and the prediction accuracy (the PLCC) in a radar chart as shown in Figure 7. There are three axes in the radar chart for the SROCC, the PLCC and the KROCC. We mark the scale of three performance metrics in the centring axis. It can be clearly observed that the SIQP metric outperforms other SIQA methods with regard to the prediction monotonicity, and comparable with the PSIQP metric concerning prediction accuracy among selected test images. Since NR methods lose some prior knowledge of the content, they are universally inferior to the reference-based SIQA methods. However, due to the efficient distance measurement employed here, this study achieves comparable performance with the NRCDM metric using lower computational complexity. It can be observed from Figure 7 that the symbol representing the NRCDM method coincides with the symbol of the CaMSIQA method. For better presentation of the performance comparison of the existing SIQA methods, we tabulate the performances of the existing SIQA methods in Table 6.
For exhaustive comparisons, we ran the selected NR methods as well as our method on the SIQD database, and tabulate the average run-time of each method in Table 7. This computational cost tests were conducted on the software platform of MATLAB R2019b on a computer with 3.60 GHz CPU processor and 64.00G RAM. Selected methods in Table 7 can be classified into four categories according to their computational costs. Methods in the first category take less than 10 −3 s to process one sonar image. The BRISQUE and SISBLIM make up this category. The second category includes method with average run-time less than 10 −1 , and our CaMSIQA falls under this category. Four methods can be classified into the third category since they all take less than 1 s for quality prediction per image. The processing time of methods fall under the last category for one sonar image exceeds 1 s. The last category includes five methods. It is clearly that the computational cost of the CaM-SIQA method is in the leading position compared with most of the selected competing NR IQA methods. Only the BRISQUE metric and the SISBLIM metric take less processing time than the CaMSIQA method. As for these IQA methods with lower computational time, the proposed method can obtain at least 11%, 8.9%, 10% and 2% performance gain in terms of the SROCC, the KROCC, the PLCC and the RMSE, respectively.

Qualitative performance comparison
To visually compare the performance of the proposed method and the competing IQA approaches, we illustrated the differences between the objective quality predicted by each selected method and the MOS value with an example of the distorted sonar image in Figure 8. Figure 8(a) shows the reference sonar image, which is captured by a acoustic lens sonar for a nurse shark during Fabien Cousteau's Mission 31 ground breaking underwater study aboard the Aquarius. We exhibited the  Qualities predicted by the selected NR methods distorted version of this image, whose MOS value is 30.45, in Figure 8(b). The most typical distortion existed in this sonar image is blur and non-eccentricity distortion with regard to SPIHT compression, which can affect the identification of rock and shark. The quality evaluated by the CaMSIQA method is very close to the MOS value provided by the SIQD database. Furthermore, we also tabulate the objective qualities predicted by selected methods in Figure 9, which have been mapped to the range [1,100]. The horizontal axis of Figure 9 indicates the NR IQA methods, which are indexed according to Table 8. The vertical axis is the predicted quality. The MOS value of this sonar image is indicated by red line. It is obvious that the proposed method is able to explore the impacts of those distortions on the utility quality and quantize them to the quality score. By contrast, the selected NR IQA methods all overestimate the quality for this image sample. This is because these IQA methods are designed for NSIs or SCIs without considering the visual characteristics based on corresponding utility for sonar images.

CONCLUSION
We have proposed a contour-based approach for NR SIQA using a statistics model (i.e. the CIS model). Based on this model, the contour information is expressed as a set of parameters of Beta fit. The quality of the test sonar image is quantified by measuring the distances between the parameter tuples of the CIS models of the test and the undistorted sonar images in a proper subspace. We have validated that our method correlates highly with human decision of utility quality for sonar images by four common-used performance metrics. Besides, the proposed CaMSIQA method is also well behaved in comparison with the state-of-the-art NR IQA methods, and is even highly comparative against the performance with the widely cited RR IQA methods. Furthermore, how to extend the CaMSIQA method to an unsupervised approach without performance loss is an interesting and challenging problem yet to be explored.