Completely blind image quality assessment via contourlet energy statistics

Correspondence Chaofeng Li, Institute of Logistics Science & Engineering, Shanghai Maritime University, Shanghai, 200135 China. Email: wxlichaofeng@126.com Abstract An aim of completely blind image quality assessment (BIQA) is to develop algorithms which can grade image quality without any prior knowledge of the images. Here, a new contourlet energy statistics based completely on blind opinion-unaware BIQA (OU-BIQA) method is proposed, which can predict the perceptual severity of a range of image distortion types without requiring any prior knowledge. According to the energy distribution of the contourlet sub-bands of natural images in log-domain, the lower-scale sub-band energy can be predicted by the corresponding higher-scale sub-band energies of distorted images. A quality model is then constructed by quantifying the difference between predicted energy and realistic energy. Meanwhile, an effective method for adjusting and compensating an undesired distortion is integrated into the quality model. Experimental results show that the proposed new method outperforms state-of-the-art OU-BIQA models on relevant portions of TID2013 database, and is competitive on the LIVE IQA database. Moreover, the proposed model is very fast, suggesting a real-time solution to high-performance BIQA.


INTRODUCTION
Automatically assessing perceptual image quality plays an important role in the image processing field, impacting such tasks as image acquisition, compression, restoration, enhancement and reproduction. Generally, objective image quality assessment (IQA) can be classified into three categories according to the availability of the original image: Full-reference IQA (FR-IQA), reduced-reference IQA (RR-IQA) and noreference/blind IQA (NR-IQA/BIQA). FR-IQA and RR-IQA methods require complete or partial reference image information to evaluate image quality. However, in many practical applications the reference image is not available, making the development of BIQA methods highly desirable.
The current BIQA methods can be classified into two categories: 'Distortion-aware' (DA) methods [1][2][3][4] and 'distortionunaware' (DU) methods [5][6][7][8][9][10]. The former category needs to know specific distortion type in advance and quantify the particular artefacts to estimate the quality of an image, and usually works well for one specific type of distortion. The latter also refers to the general-purpose BIQA, which does not need to have prior knowledge of distortion category and is clearly a much more challenging task than the former, attracting significant attention in the image processing fields. By contrast, 'DU' IQA methods do not exploit prior knowledge of distortion for quality estimation, thus are potentially more practical in real-world applications.
General-purpose DU BIQA methods can also be classified into two subcategories: Opinion-aware (OA) and OU methods. The former category requires database(s) of distorted images and associated subjective opinion scores to be trained on. The latter does not require training on databases of human judgments of distorted images.
Generally, shallow learning-based methods follow a two-step procedure: Feature extraction and model regression by human subjective scores. In [6], the authors proposed a neural networkbased BIQA algorithm that was trained on perceptually relevant features including the mean phase congruency image, the entropy of the phase congruency image, the entropy of the distorted image, and the gradient of the distorted image. The BRISQUE [7] index is a distortion-generic BIQA model, which extracts scene statistics of locally normalised luminance coefficients in the spatial domain, and then a support vector regression (SVR) model is used to train and predict image quality. The BIQI [8] and DIIVINE [9] methods proposed by Moorthy et al. are a two-stage framework which first uses a support vector machine (SVM) to detect the distortion type, and then uses an SVR model specified to that distortion for BIQA. The authors of [10] proposed a single-stage NR IQA algorithm called the BLIINDS index which predicts image quality based on the statistics of local image discrete cosine transform (DCT) coefficients. The BLIINDS-II [11] index is extended by the BLI-INDS index with more natural scene statistic (NSS)-based DCT features. The CORNIA [13] uses raw-image-patches extracted from a set of unlabelled images to learn a dictionary in an unsupervised manner. The most widely used algorithm for regression is the SVR with a radial basis function as kernel. The authors of [17] built a new NR-IQA model, NRVPD, by learning a prior knowledge database. In the NR-IQA methods, FR-IQA models can be used as weak supervision indicators to make the regressor more robust. A successful example is that the authors of [18] and [19] have utilised big data technology for natural, screen content and enhanced image to extract perceptual features, and have used some full-reference image quality measures (FR IQMs) to obtain the label of training data for conducting quality regression.
With the rapid development of deep learning technology, some convolution neural network (CNN)-based BIQA methods have been proposed. For example, the authors of [23] proposed a CNN-based BIQA model and achieved competitive results with FR IQA methods. In [24], deep perceptual features are extracted via a CNN for predicting quality, and an end-toend deep learning-based BIQA model is built in [22,25]. The authors of [26] proposed a deep-learning approach deep image quality metric (DIQM) to learn global image quality. Although the performances of these deep learning-based methods surpass shallow learning-based methods, they still need databases of human-rated distorted images to train, and such learningbased BIQA models are necessarily limited.
Considering the disadvantages of OA BIQA methods, it is necessary to develop OU IQA models. In recent years, a few OU BIQA algorithms have been proposed. The authors of [27] developed a no-training algorithm called no-reference quality index (NRQI) using a different complementary set of perceptual features, which can handle multiple distortions without any training on human scores, but it needs to differentiate noise from other distortion categories. In [28], a quality-aware clustering (QAC) method was presented, which first partitions the distorted image into overlapped patches, and uses a percentile pooling strategy to estimate the local quality of each patch, then learns a set of centroids on each quality level. These centroids are then used as a codebook to infer the quality of each patch in a given image, and subsequently a perceptual quality score of the whole image can be obtained. It requires some reference and distorted images, but without human scores. The authors of [29] proposed the natural image quality evaluator (NIQE) by constructing a 'quality aware' collection of statistical features based on a simple and successful space domain NSS model, providing an NSS-based modelling framework for training-free OU BIQA model design. Zhang et al. [30] integrate multiple carefully selected quality-aware features to learn a multivariate Gaussian (MVG) model of pristine images, and propose an extended 'completely blind' OU BIQA model called integrated local NIQE (IL-NIQE). The authors of [31] designed and extracted three kinds of NSS features to characterise the structure, naturalness and the perception quality, respectively, and learned a pristine MVG model with these quality-aware features, to create an OU BIQA model named structure, naturalness and perception quality-driven NIQE (SNP-NIQE), which also followed the design philosophy of NIQE. However, until now, OU BIQA models have not achieved quality prediction performance comparable with the state-of-the-art OA BIQA models.
In this study, we research the sub-band energy of the coefficients of contourlet transforms of natural images and find that they show a linear distribution with scale in the log-domain, but the law is destroyed for distorted images. The higher-scale subband energies of distorted images are affected little by distortion, so they are used to predict the corresponding lower-scale sub-band energy. A quality model is then constructed by quantifying the difference between predicted and realistic energies. Meanwhile, an effective model for adjusting and compensating an undesired distortion is integrated into the quality model. We call the resulting algorithm the blind contourlet energy statistics (BCES) IQA index.
Our main contributions can be summarised as follows.
1. We investigate the sub-band energy of the coefficients of contourlet transforms of natural images and found that they show a linear distribution with scale in the log-domain, and the higher-scale sub-band energies of distorted images are affected little by distortion. 2. We propose a new contourlet energy statistics based completely on blind OU-BIQA method, which is constructed by quantifying the difference between predicted and realistic energies, and an effective model for adjusting and compensating an undesired JPEG distortion category is integrated into the quality model.
The study is organised as follows. Section 2 introduces energy statistical features of images in the contourlet transform, and also analyses the distributions of sub-band coefficient of natural and distorted images. The overall OU BIQA model is described in Section 3. Section 4 discusses and analyses the experimental results on the LIVE and TID2013 IQA database. Finally, conclusions are made in Section 5.

ENERGY STATISTIC ANALYSIS OF IMAGES IN THE CONTOURLET DOMAIN
Contours and curves projected from surfaces to images are perceptually relevant, and have intrinsic geometrical properties [32]. The directional multi-resolution contourlet transform (CT) efficiently captures and represents singularities along smooth object boundaries in images. As we shall see, we can make use of the non-linear statistical dependencies that exist between the CT coefficients to assess changes in image quality arising from image degradations.

A brief introduction of CT
The CT, or pyramidal directional filter bank (DFB) was proposed by Do and Vetterli [33] in 2002. An efficient representation of image structure, it decomposes a 2-D image into lowpass and high-pass sub-bands using a Laplacian pyramid (LP), thereby highlighting singularities. A DFB is applied to the highpass sub-bands to capture singular points distributed along like direction. Figure 1 illustrates the filter structure of the CT. Figure 2 shows an example of contourlet decomposition of an image using a three LP and eight DFB transforms. CT can capture smooth contours and geometric structures of images usefully. From the perspective of human visual system (HVS), the characteristics of the receptive fields in the visual cortex are also localised, oriented and band-pass. Therefore, the features obtained by the CT can be employed to reflect the features of visual perception.

Energy statistics of natural images in contourlet domain
The efficient, low redundancy, CT decomposes images into directional sub-bands whose coefficients reflect underlying statistical features of natural, photographic images.  [34] pointed out that the spectra of images of natural scenes are linear on a logarithmic scale and that the responses of visual cortical cells are matched to this characteristic. Accordingly, we employ statistical features derived from logarithms of CT. The logarithmic energy of the sub-band coefficients are computed by the following formula: where s is the scale index, o indexes orientation, and N is the number of sub-band coefficients. Here we use four scales and eight orientations in the CT. We applied the contourlet transformation to all 125 natural images in the Berkeley image database [35], obtaining 32 subband coefficients from each image. A randomly selected set of 16 of these images are shown in Figure 3. Berkeley image database includes much more reference images and has been used in this study. Figure 4 shows the distribution of sub-band energy, strongly suggesting an approximately linear relationship between mean sub-band energy and sub-band frequency, noting where four scales are ordered in 4th, 3rd, 2nd 1st and eight sub-band orientation energies are reordered in the 7th, 2nd, 6th, 3rd, 8th, 1st, 5th, 4th at each scale.

Energy statistics of distorted images in contourlet domain
We also analysed the contourlet energy statistics on the popular LIVE IQA database [36], which consists of 29 reference images and 779 distorted images that span various distortion categories-JPEG compression (JPEG) and JPEG2000 (JP2K) compression, white noise (WN), Gaussian blur (Gblur), and a Rayleigh fading channel (FF). All of the pristine and distorted images from each of the JP2K, JPEG, WN, Gblur and FF distortion categories were subjected to the CT, and their sub-band energy distributions as shown in Figures 5(a) to (e). It can be seen that for JPEG2K, WN, Gblur and FF distorted categories,  as compared with the natural images, the sub-band energies at fine scales are altered, but those at coarse scales changed little (some exception for heavy distorted images). The contourlet energies of the distorted images at fine scales exhibit significant variance, with the exception of JPEG distortion. While JPEG compression leads to image high-frequency information loss, it also creates energy from emergent block boundaries. Overall, the energy of JPEG images change little.
As shown in Figure 5, the coarse-scale sub-band energies of distorted images are mostly unchanged, but the sub-band energies at fine scales are affected (other than for JPEG distortion). This suggests that the coarse-scale sub-band energies of distorted images can be used to predict image quality by comparing the destroyed sub-band energies with the predicted subband energies. However, for JPEG pictures, some additional processing can be applied.

BIQA ALGORITHM USING CONTOURLET ENERGY STATISTICS
Following the analysis in Section 2, we propose a new BIQA model by using the contourlet energy statistics in the spatial domain as follows.

Calculating the predicted matrices
In order to analyse the relationships between sub-band energies of natural images across scales, we used a set of 125 pristine images from the Berkeley image segmentation database [35] with sizes ranging from 480 × 320 to 1280 × 720. We ensured that no overlap of content occurs to the images in the IQA databases we used in our experiments [36,37]. The images may be viewed in [38]. These images were decomposed using a CT over four scales and eight orientations, yielding a set of sub-band energies E. The predicted matrix over scales was calculated as where E N 1 , E N 2 and E N 3 are the sub-band energies of the three finest scales, and E N 4 are the sub-band energies at the coarsest 4th scale. M N 1 , M N 2 and M N 3 is the predicted (empirical) correlation matrix between the coarse 4th scale and the other finer scales.

Heavy distortions
As shown in Figures 5(d) and (e), when the perceptual distortion of image is very heavy, the sub-band energies of the coarse scale (the 4th scale) are significantly affected. When applying these degraded sub-band energies at the coarse scale to predict subband energies at finer scales, errors may occur. To reduce these errors, we adjust for heavily degraded energies at the coarse scale as follows: where E D4 is the vector, and Avg(E D4 ) denotes the average energies of the degraded images at the 4th scale; and T is a threshold value. Here, we use the minimum of the average energies among the 125 natural images at the 4th scale as the automatically computed threshold. E N 4 is a 125 x 8 matrix, and Min(E N 4 ) is the minimum of the sub-band energies at the 4th scale of the 125 images, which are used to compensate for heavy distorted images.

Sub-band degraded energy prediction
The predicted sub-band energiesE P1 , E P2 , E P3 of distorted images the three finer scales, are calculated in the empirical correlation matrices M N 1 , M N 2 , M N 3 along with the sub-band energies of distorted images at the 4 th scale as follows: where E D4 are the sub-band energies of the distorted images at the coarsest scale, and E P1 , E P2 and E P3 are vectors which denote the predicted sub-band energies at the three finer scales, respectively.
In order to test the effectiveness of Equation (4), we use the 29 undistorted (reference) images from the LIVE IQA database by predicting the sub-band energies at the three fine scales from the sub-band energies of the 4th scale and the predicted matrices E P1 , E P2 and E P3 . We then calculated the mean absolute

FIGURE 6
Scatter plots of Q 1 predicted quality scores against differential mean opinion scores (DMOS) on the LIVE image quality assessment (LIVE IQA) database error (MAE) and mean relative error (MRE) between the predicted and actual energies in Table 1.

A contourlet energy blind IQA index
By using E C 1 , E C 2 , E C 3 as the sub-band energies of distorted images at the three finer scales, and by comparing them with their corresponding predicted sub-band energies E P1 , E P2 and E P3 using Equation (4) we construct a preliminary IQA index as follows.

Energy compensation for JPEG distortion
From Figure 5(b), it is apparent that the sub-band energy distribution of JPEG-distorted images is nearly unchanged relative to natural images. We give the scatter plots of Q 1 predicted quality scores against subjective differential mean opinion scores (DMOS) in the LIVE IQA database as shown in Figure 6; it can be seen that Q 1 index cannot reflect the correlation between predictive and subjective scores on JPEG distortion category. In order to make the proposed OU BIQA model applicable to JPEG distortion, an energy compensation for JPEG distortion is needed.
JPEG is a block DCT-based lossy image coding technique. Wang and Bovik [4] devised an effective way to capture both the blurring and blocking effects of JPEG distortion, using the differencing signal along each horizontal and vertical line as follows: where EaB h and EaB v are the average difference across block boundaries along rows and columns, EiB h and EiB v are the average neighbouring pixel differences (within blocks) along rows and columns. The zero-crossing (ZC) rate can be estimated as where Finally, the overall set of JPEG-sensitive features are The authors of [4] define a JPEG quality index as where , , 1 , 2 and 3 are the model parameters. We used the same values as [4], namely , = −245.9, = 261.9 , 1 = −0.0240, 2 = 0.0160and 3 = 0.0064, respectively.

Contourlet energy statistics based IQA index
We define the BCES index by adding Q 2 to the previously defined features: where w is a weight factor, which is set to 0.07.

Testing on the LIVE IQA database
We tested the proposed BCES index on the LIVE IQA database, which consists of 29 different reference images and 779 distorted images from five distortion categories-JPEG2000 (JP2K), JPEG, WN, Gblur and FF; along with the associated DMOS, which represent human judgments of image quality. Two performance measures were used to evaluate the algorithm. The first is the Spearman rank-order correlation coefficient (SROCC) which evaluates the prediction monotonic of the quality metric. The second is the linear correlation coefficient (LCC) which evaluates the prediction accuracy. For the LCC, the logistic function (which is specified by the video quality expert group) was used to fit the model predictions to the subjective data. A value close to 1 for SROCC and LCC indicates superior correlation with human perception.
Experimental results of BCES are reported in Tables 2 and3. For comparison, we also give the published results of the topperforming OU BIQA algorithms QAC [28], NIQE [29], IL-NIQE [30] and SNP-NIQE [31]. In addition, we also selected three classic OA-BIQA algorithms: DIIVINE, BRISQUE and BLINDS-II for comparison, where the section of TID2013 database that includes the same distortion types as the LIVE database was used for training, with these results also tabulated in Tables 2 and 3. The top two performing models are highlighted in bold font in Tables 2 and 3.
From Tables 2 and 3, it can be seen that the proposed BCES index achieves the best performance on the difficult FF distortion category, while obtaining competitive results on the JP2K, WN and Gblur distortion categories as compared with other 'completely blind' IQA algorithms because of the added JPEG energy compensation term.
The overall model also performed quite well on the JPEG distortion. Without the additional term, the SRCC/LCC was only 0.23/0.25. BCES is also competitive with the highperforming SNP-NIQE, IL-NIQE and NIQE on average and for single distorted categories. Since the authors of SNP-NIQE did not publish LCC scores, they do not appear in Table 3.
We also list the results of three classic supervised OA-BIQA methods: DIIVINE, BRISQUE and BLINDS-II again using the part of the TID2013 database having the same distortion types as the LIVE database to train on, then testing on the entire LIVE IQA database. From Tables 2 and 3, it is apparent that supervised OA-BIQA methods only get limited generalisation capability by cross validation, and BCES outperformed them on the LIVE IQA database. Figure 7 shows scatter plots of predicted quality scores against subjective DMOS for BCES, IL-NIQE, NIQE and QAC. Since the authors of SNP-NIQE have not released their source code, we are unable to show scatter plots of SNP-NIQE on the LIVE IQA database. It may be observed that the performance of QAC on the across distortion categories deviates much more than those of BCES, NIQE and IL-NIQE. Since the NIQE and IL-NIQE deliver lower scores on higher quality images, their scatter plots have positive slopes.

Testing on the TID2013 database
In order to further evaluate the performance of the proposed BCES, we tested the same algorithm on the same distortions (Additive Gaussian Noise (AGN), Gblur, JPEG, JP2K, JPEG transmission errors (JPTE) and JP2K transmission errors (J2TE)) in the TID2013 database [37]. The SROCC and LCC scores are listed in Tables 4 and 5, where the top two performing models are indicated by bold font. From Tables 4 and 5, we see that the proposed BCES index attained the best results on average across all distortion categories among the OU-BIQA algorithms. It also correlated well with MOS scores on most distortion categories. BCES and SNP-NIQE attained 'top two' performances on most distortion categories among the 'completely blind' OU-BIQA algorithms. Again, we cannot list the LCC of SNP-NIQE in Table 5.
For the classic supervised QA-BIQA models DIIVINE, BRISQUE and BLINDS-II, we used the entire LIVE IQA database to train on, tested on the relevant part of TID2013 dataset. As seen from Tables 4 and 5, the BCES index outperformed DIIVINE, BRISQUE and BLINDS-II on average and on the single JPEG, JP2K and JPTE distortion categories. Figure 8 shows scatter plots of predicted quality scores against MOS for the top representative models (BCES, IL-NIQE, NIQE and QAC) on the same portion of TID2013 database. One way observe that the performances of NIQE and IL-NIQE across distortion categories deviated much more

Statistical significance and hypothesis testing
In order to ascertain whether the apparent differences observed in IQA performance are statistically significant, we applied a variance-based hypothesis test (HT) using the residuals between DMOS and the quality predicted by the IQA (after the nonlinear mapping). This was done on the LIVE IQA database and the portion of the TID2013 database corresponding to distortions in the LIVE IQA database.
The test we used is based on an assumption of gaussianity of the residual differences between the IQA prediction and DMOS. It uses the F-statistic to compare the variance of two sets of sample points. The null hypothesis is that the residuals from one IQA come from the same distribution and are statistically indistinguishable (with 95% confidence) from the residuals of another IQA. The results of the hypothesis test using the residuals between the DMOS and IQA predictions are shown in Tables 6 and 7.
In Tables 6 and 7, a value of '1' (highlighted in red) indicates that the model in the row is significantly better than the model in the column, while a value of '-1' (highlighted in grey) indicates that the first model is not significantly better than the other. A value of '0' indicates that the row and column are statistically indistinguishable (or equivalent). As can be seen, BCES and NIQE obtained the best statistical significance on the LIVE IQA database, but on the

Computational analysis
In many practical applications, measuring the quality of an input image online is needed, so speed is another important factor to evaluate a BIQA method. Table 8 lists the runtimes of several OU-BIQA algorithms on a 3.2-GHz processor with 8 GB of RAM running Windows 10.0 and MATLAB R2017b on each 512 x 768 image. BCES was the fastest model, suggesting a real-time solution to high-performance BIQA. While we cannot compare it, the creators of SNP-NIQE have shown that its runtime is far longer than that of NIQE.

CONCLUSION
We have developed a 'completely blind' OU-BIQA model that can assess image quality across a range of distorted categories without requiring any knowledge of anticipated distortions or human opinions of them. The quality metric is constructed by quantifying the difference between predicted sub-band and realistic sub-band energies of CTs, with energy compensation for JPEG distortion integrated into the quality model. The proposed new method is competitive with current top OU-BIQA models, and is faster, suggesting a real-time solution to highperformance BIQA. In the future study, complex contourlet energy statistics will be considered to improve the BCES model. Recently, Gu et al. [38] proposed a picture-based predictor of PM2.5 concentration (PPPC) algorithm to estimate particulate matter (PM2.5) successfully, and we will improve the BCES for predicting PM2.5.