A generalized quality assessment method for natural and screen content images

Funding information Fundamental Research Grant Scheme; Ministry of Higher Education, Malaysia, Grant/Award Number: F02/FRGS/2024/2020 Abstract A generalized objective quality assessment method is proposed for natural images and screen content images. Since natural images and screen content images have different statistical properties, the modelling of a generalized quality assessment method that works for both types of images is complicated because some properties of natural images and screen content images are conflicting to one another. The proposed method assesses the perceptual quality of an image based on edge magnitude and direction. In this method, an image is first separated into regions with high and low gradients. Gradient is used due to the small perceptual span of the human visual system for textual content. For high gradient regions, small kernel size of Prewitt operators is used to obtain the gradient magnitude and direction. Correspondingly, bigger kernel size of Prewitt operators is utilized for low gradient regions. Visual quality indices are computed from both regions and pooled to obtain the final quality index. From the performance comparison, it is shown that the proposed method could assess the perceived quality of natural images and screen content images with high accuracy.


INTRODUCTION
The use of screen content images (SCIs) are widespread in recent years. Many activities in our daily lives involved the use of SCIs. Some examples are remote computing, screen sharing and cloud computing and gaming [1][2][3][4]. Since these activities involve the transmitting, receiving and compression of SCIs, the quality of SCIs is essential to be maintained throughout these activities for better quality of experience of the end users. Quality assessment (QA) of an SCI is generally more complex than natural image (NI) as SCI consists of a mixture of texts, graphical contents and NIs. Theoretically, these different types of content have to be assessed separately as they have different image properties.
Most of the existing image QA methods are devised and tested on NIs only. This is due to the ubiquity of NIs displayable by digital gadgets since the past two decades. In most of the existing applications, NIs are being displayed and perceived by humans. Therefore, assessing the quality of NIs is important for improving and sustaining the quality of experience and service of the end users. Some of the popular methods This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology are structural similarity (SSIM) [5], multiscale SSIM (MSSIM) [6], information weighted SSIM (IWSSIM) [7], feature similarity (FSIM) [8], gradient magnitude similarity deviation (GMSD) [9], visual saliency-based index (VSI) [10] and visual information fidelity (VIF) [11]. They are specifically designed for NIs and have good performance in assessing the quality of NIs. SSIM assesses image quality by considering the similarity of luminance, contrast and structure of the images. MSSIM is an extension of SSIM by incorporating multiscale information. IWSSIM is another extension of the SSIM by introducing a new weighting method based on the content information. FSIM incorporates the similarities of phase congruency and gradient magnitude. It also has an extension for colour images. Based on the gradient magnitude, GMSD is one of the best performing QA methods for NIs with low computational complexity. Instead of computing the final index as the average of similarity indices, the standard deviation is used in GMSD. VSI utilizes visual saliency maps for measuring quality distortions and works as a weighting function for computing the final quality index. Another good performing method is VIF. It involves the use of natural scene statistics (NSS) and utilizes steerable pyramid decomposition for modelling distortions. Yet, as will be shown in Section 3, the performances of all these methods drop when they are used to assess SCIs. This is mainly due to the highly different statistical properties of SCIs and NIs. SCIs have thin lines, limited colour variations, simple shapes and sharp edges [12,13] whereas NIs have thick lines, random colour variations, complex shapes and smooth edges. These differences cause the existing QA methods to perform well in assessing the quality of NIs only but not the SCIs.
Despite the existence of QA methods for NIs, there are also some QA methods which are specifically designed for assessing the quality of SCIs. Most of these methods use edge information as the basis of QA because the human visual system (HVS) is sensitive to the presence of edges in an image [14]. Yang et al. have proposed SCI perceptual quality assessment (SPQA) [15] to measure the quality of SCIs. SPQA index is composed of the similarity of sharpness for pictorial regions and the similarities of sharpness and luminance for textual regions. The final quality index is the result of combination of the similarity scores through a new weighting strategy that is based on correlation analysis of the subjective scores. In order to assess the quality of SCIs more accurately, Gu et al. proposed structural variationbased quality index (SVQI) [16] to assess the quality of SCIs. SVQI measures global structure variations through entropy values and complexity. For local structure variations, it measures edge and corner variations. By weighting gradient magnitude with saliency maps, Gu et al. also put forward a saliency-guided quality measure of SCIs (SQMS) [17]. The saliency maps are generated from fixation-and saccade-inspired kernels.
Apart from the works of Gu et al., Wang et al. improved their popular SSIM method and proposed SCI quality index (SQI) [18] for assessing the quality of SCIs. SQI measures SSIM indices on textual or pictorial regions with different sizes of Gaussian kernels. Textual or pictorial regions of SCIs are differentiated based on thresholds of information content. Another QA method for SCIs is the structure-induced quality metric (SIQM) [19]. SIQM weights SSIM indices of an image with structural degradation measure. Inspired by HVS sensitivity towards high-frequency regions, Fang et al. proposed a method based on structural features and uncertainty weighting (SFUW) [20]. Similar to SPQA [15] and SQI [18], the quality is assessed in terms of textual and pictorial regions. For textual regions, the similarities of horizontal and vertical gradients are computed. Luminance and local binary pattern (LBP) [21] similarities are calculated to account for the quality of pictorial regions. The final quality score is based on uncertainty weighting that is computed from the entropy of the gradient information. Fu et al. utilized multiscale difference of Gaussian (MDoG) for SCI QA task and put forward MDoG [22]. Through DoG, MDoG computes the final quality index of an SCI by weighting edge similarity (ESIM) with edge strength. Focussing on the gradient magnitude and direction similarities, Ni et al. proposed a gradient similarity related index (GSS) [23]. Gradient direction of GSS is extracted from local information of gradient magnitude. The final index is the result of deviation-based pooling. Ni et al. further incorporated edge width similarity in addition to edge magnitude and direction similarity in the ESIM method [24]. These three similarities are modelled from a parametric edge model [25].
Different from most of their works on SCIs, Ni et al. measures gradient information with Gabor filters in Gabor featurebased model (GFM) [26]. GFM is perhaps the best performing QA method for SCIs to date [27]. There are also methods that are not based on edge property. Feature quality index (FQI) [28], proposed by Rahul and Tiwari, is based on assessing the feature points. The underlying motivation is that the HVS is more sensitive towards the changes of features instead of intensity. The features are extracted by using DoG.
From the review of existing methods, none of them utilizes Prewitt kernel for extracting the gradient information. The performance of Prewitt kernel is worth to be explored as it has lower computational complexity and with good performances for assessing the quality of NIs [9]. With its lower computational complexity or less processing time, Prewitt kernel is suitable for real-time processing such as real-time video processing [29]. Many other image processing applications also involve the use of Prewitt kernel for edge detection such as in autofocusing approach [30] and image fusion [31]. Thus, it is essential to explore the performance of Prewitt kernel for QA of SCIs and NIs. Most of the existing methods in the literature assess the quality of SCIs without differentiating pictorial and textual regions. However, HVS uses a larger visual field when deciphering pictorial region compared to textual regions [18]. Hence, the pictorial and textual regions should be assessed with different kernel sizes. For the proposed method, a larger kernel size is applied to higher gradient regions. The application of a larger kernel size is verified with the experiments in Section 3.7.
We propose a generalized QA method for both SCIs and NIs which is based on edge magnitude and direction, dubbed as EMD for convenience. EMD is motivated from SSIM [5]. In SSIM, similarity of luminance, contrast and structure are computed. For EMD, similarities of edge magnitude and direction are computed instead. From the authors' previous work, edge or gradient information can reflect the quality levels of both SCIs and NIs [27]. For edge direction similarities, dot product of direction vectors is computed. This is different from most existing methods that utilize similarity equation from [5]. As shown in Section 3, this method simplifies the computation of final quality index. To the best knowledge of the authors, there is minimal work which performs well in assessing the quality of both SCIs and NIs. There are many methods proposed for image QA that incorporates edge information such as the methods by Zhu and Wang [32] and Liu et al. [33]. The most related work is the work by Min et al. [34], which only focuses on QA of NIs and SCIs affected by compression distortions and ignores other distortions such as Gaussian noise and blurring. EMD, on the other hand, is able to assess the qualities of NIs and SCIs from all available types of distortion in benchmark datasets with good performance.
The major contribution in this research is in the proposal of a new method, EMD, to assess the quality of both NIs and SCIs without distinguishing the types of images being assessed. The performance of the Prewitt kernel for the QA of SCIs is also proved to be competitive. Besides that, a new framework that considers larger kernel size is proposed.
The remaining of this paper is organized as follows. Section 2 provides the detail of the proposed EMD. The computations of edge strength and direction for low and high gradient regions are elaborated. The performance of EMD is compared to stateof-the-art methods in Section 3. The results are also discussed and analysed. Section 4 concludes this paper.

THE PROPOSED METHOD -EMD
Since the edge information works for both SCIs and NIs [27], the proposed EMD was designed based on edge information too. Specifically, edge magnitude and edge direction are chosen due to their relevance to NIs and SCIs [22,24]. Prewitt operator, as shown in Figure 1, is chosen to measure the edge information. This is inspired by the famous NIs QA method, GMSD [9]. Since GMSD has good performance for assessing both NIs and SCIs [27], Prewitt operator is expected to have good performance and is incorporated in EMD. The edge magnitude is computed by convolving Prewitt operators with the target image. This is shown as: where * denotes the convolution operation, I is the target image, P x and P y are the Prewitt vertical and horizontal operators respectively. Parameters M x and M y are the edge magnitudes in terms of vertical and horizontal directions. The final edge magnitude (EM ) for an image can be computed as the norm of M x and M y . In this case, the norm of the two edge magnitudes is calculated which is defined as Equation (3) can also be viewed as a computation of image complexity. According to the work of Redies et al. [35], image complexity is measured as the norm of the gradient magnitude along all directions. Since only the vertical and horizontal directions are measured in Equation (3), it can be considered as a simplified computation of image complexity.
In [18], it was verified that HVS has a smaller perceptual area or visual span while reading texts. Based on this characteristic, HVS is assumed to perceive high gradient region with a smaller visual span. This is reasonable as text areas have higher gradient contents than pictorial areas. This is further illustrated in Figure 2 and Figure 3. In Figure 2(a) and Figure 2(c), two SCIs from [15] is shown with its EM map on the right as Figure 2(b) and Figure 2(d). In the EM map, higher magnitude appears as white pixels. It is observed that textual regions have higher EM as compared to pictorial regions. For NI, the examples are shown in Figure 3(a) and Figure 3(b) [36]. Figure 3(c) and Figure 3(d) show that most of the regions of the image have low gradient information.
The low and high gradient regions of an image can be separated with EM . For the pixels with EM greater than a threshold, they are regarded as high gradient regions (HG ). Meanwhile, low gradient regions (LG ) are formed by the pixels with their EM lower than the threshold. The determination of the value of the threshold is important as the performance of EMD varies depending on the value. For simplicity, the mean of the EM of an image is used as the threshold, thres. This is shown as: where N is the total number of pixels and I n is a particular pixel in the targeted image. EM of the LG regions is recomputed with a larger size of the Prewitt operator, as shown in Figure 4. The utilization of larger size of the Prewitt operator is further justified in Section 3.7.
The EM for LG regions,EM LG and HG regions, EM HG , can be reformulated as: where P ′ x and P ′ y are Prewitt operators with the larger size, I LG is the LG regions and I HG is the HG regions in the target image. The EM maps for the complex regions are shown in Figure 5(b). The edges are classified as high gradient regions and the background is treated as low gradient regions. This shows the mean of EM is applicable for a complex region too. For the following discussion, HG and LG regions undergo the same process. Therefore, the following notations for HG and LG are neglected for simplicity.
After computing EM of both reference and distorted images, the similarity of edge magnitude can be computed as: In Equation (12), EM r and EM d are the edge magnitude from reference and distorted images, respectively, and C 1 is a constant to remove instability as suggested in [5].
For edge direction, a direction vector ⃗ v is formed from edge magnitudes in terms of vertical and horizontal directions: wherex andŷ are the unit vectors in the vertical and horizontal directions, respectively. For the comparison of vectors, ⃗ v is transformed into unit vectorv as: where EM is the edge magnitude as defined in Equations (8) and (11). The similarity of edge direction is computed as the dot product of the unit vector defined in Equation (14) of the reference and distorted images. This is expressed as:  where C 2 is another constant similar to C 1 to prevent numerical instability while the denominator is close to zero. By considering similarity indices from Equation (15) and Equation (12), EMD indices in terms of regions, EM D r , can be computed and is defined as: where p and q denote the relative importance of edge magnitude and direction similarities. The subscript r denotes the region which can be LG or HG . The same weight is given to both similarities so p1 = q1 = 1. EM D r can be further simplified to: with C 3 regarded as another constant. Then, the final quality index of EMD can be directly computed from EM D r indices of LG and HG . This can be expressed as: where EM D LG and EM D HG denote EM D r for LG and HG , respectively. Parameters p2 and q2 have the same definition as p1 and q1 in Equation (16). Again, for simplicity, p2 and q2 are set to 1. The final EMD index for the target image is thus defined as:

EXPERIMENTAL RESULTS
To validate the efficacy of EMD, experiments were conducted to compare the performance of different methods. The experiments used two SCIs databases and one NIs database.

Databases
Two types of database were used in the experiment, i.e. SCIs database and NIs database. For SCIs, SIQAD [15] and SCIs database (SCID) [24] databases were used. In the SIQAD database, 980 distorted SCIs are generated from 20 reference SCIs. All reference images are captured through screen snapshots with various resolutions. There are seven types of distortions used to give distortions to the reference SCIs in SIQAD database. They are Gaussian blur (GB), motion blur (MB), Gaussian noise (GN), JPEG compression (JPEG), JPEG2000 compression (J2K), contrast change (CC) and layer segmentation-based coding (LSC). For each distortion type, seven distortion levels were created for each reference image. The second database, SCID, is the largest SCIs database to date. There are 40 reference SCIs cropped with a fixed size of 1280 × 720, and captured through screen snapshots. Nine different distortion types were applied to the reference images. Five different levels of distortion were introduced for each distortion type. The distortion types are MB, GN, GB, CC, JPEG, J2K high-efficiency video coding (HEVC), HEVC screen content coding (SCC), colour quantization with dithering (CQD) and colour saturation change (CSC). In total, 1800 distorted images were generated in the SCID database. For NIs, LIVE [36] image database is used. There are 779 distorted images in the LIVE database. They were generated from 29 reference images with five types of distortion. The distortion types are GB, additive white noise (AWN), JPEG, J2K and simulated fast fading Rayleigh channel (FF). Most of the images are in the resolution of 768 × 512. The details of all three databases are summarized in Table 1.

Evaluation criteria
There are three evaluation criteria used throughout the experiment. These are Spearman rank-order correlation coefficient (SC), Pearson linear correlation coefficient (PC) and root mean squared error (RE). These evaluation criteria are chosen based on the common practice of the existing works in the literature [5-11, 14-20, 22-24, 26, 28]. SC measures the prediction monotonicity. Its value indicates the degree of objective and subjective scores to fit a monotonic function. It is defined as: In Equation (20), r i denotes the ranking difference of the ith image in terms of subjective and objective scores and S is the total number of images in a database.
PC measures the prediction accuracy of the methods. In other words, its value is deemed as the linearity between objective and subjective scores. Meanwhile, the consistency between objective and subjective scores are quantified by RE. As suggested by the Video Quality Experts Group (VQEG) [37], objective scores need to be transformed into predicted scores before computing PC and RE. This is done with a four-parameter logistic regression function, which is defined as [37]: where 1 , 2 , 3 and 4 are model parameters learned from Equation (21). Parameter Q and Q t are the predicted and objective scores, respectively. After transforming to the predicted scores, PC and RE can be computed with the following equations: where B is the subjective score whileQ andB are the mean value of Q and B, respectively. For SC and PC, a value of one indicates perfect result, while zero is the worst case. A value of zero is the best result for RE while its worst case is infinity.
Since EMD is modelled for the luminance component of an image only, colour-related distortions are omitted for comparison. Thus, 400 images distorted by CQD and CSC in SCID image database are omitted. The remaining 1400 images in SCID database are used for testing.
Most of the SCIs QA methods do not involve image scaling before processing, while most of the NIs QA methods do. This may due to the smaller size of contents in SCIs. In the case of NIs, the size of the content is usually larger. Proper scaling helps to reduce computational cost for assessing the quality. Therefore, when applying NIs QA methods in SCIs database, the scaling process is omitted. The methods involved are SSIM, FSIM and GMSD. Meanwhile, while testing SCIs QA methods in NIs database, all images are down-sampled to half of the original size before processing. These two steps help to compare the SCIs and NIs QA methods in a similar manner.

Performance results
The results of the overall performances of all methods in all three databases are shown in Table 2. The three best performing methods for each criterion are highlighted. EMD achieves the best three results for all databases in terms of SC, PC and RE. Compared to the other QA methods, EMD has excellent results in both SCIs and NIs databases. For SIQAD image database, EMD has the best SC of 0.885. In terms of PC, it attains similar results to SQMS with the value of 0.887. It is only 0.004 lower than the best result that is achieved by SVQI. On the other hand, EMD has the best results for SC, PC and RE for SCID image databases. It attains 0.904, 0.903 and 6.097 of SC, PC and RE, respectively. EMD's results are slightly better than the results of GFM and SVQI.
In terms of the LIVE image database, VIF and FSIM perform better than EMD, but EMD results are close to both methods. For SC, EMD's result is only 0.001 behind VIF and it has the same results with FSIM. The PC and RE values of EMD are also close to the results of VIF and FSIM. Thus, EMD performs well in assessing the quality of NIs too. However, VIF and FSIM do not achieve good correlations for SCIs. EMD has superiority in assessing the quality of both SCIs and NIS. In other words, EMD has better generalization for assessing the quality of different types of images.
The weighted average results are also shown in Table 2. The weighted average results are calculated based on the number of distorted images in each database. EMD attains the best weighted average results with SC and PC of 0.919 and 0.915.  SVQI, SQMS and GFM lag behind EMD. SCIs QA methods have better weighted average results as they perform well for both SCIs and NIs databases. The average results of VIF and FSIM are lower due to their lower correlations for SCIs. Generally, SCIs QA methods perform well for SCIs, but they fall behind some of the NIs QA methods for NIs. The NIs QA methods mostly do not perform well in SCIs databases except for VIF and GMSD. VIF and GMSD perform well as they incorporate gradient information. For GMSD, gradient information computed from the Prewitt operator is being compared. VIF also incorporates gradient information by utilizing subband of steerable wavelet. This observation proves that edge or gradient information is useful indicators of quality in both SCIs and NIs. From the overall view of the results in Table 2, EMD performs well in all image databases. Thus, it is superior in generality compared to other methods.
The results of the methods according to distortion types of SIQAD, SCID and LIVE image databases are shown in Table 3, Table 4 and Table 5. The best three performing methods for each criterion and distortion type are highlighted. The results of RE are neglected as they are similar to those of the PC.
For SIQAD image database, EMD is among the best three performing methods for MB, JPEG and LSC. In terms of GB and CC distortion types, EMD has its PC among the best three. Its SC results of these two distortion types are very close to the results of the best three methods. For GN and J2K, the performance of EMD is also very close to the best performing methods. The weighted averages based on the distortion types are also shown. These weighted average results are different from those of Table 2. The weighted averages in Table 3, Table 4 and Table 5 are computed based on distortion types. Meanwhile, the weighted average in Table 2 is calculated based on overall results. EMD has good performances for weighted averages. Its PC result of 0.858 is the best. For SC, it is only 0.003 behind GFM. Overall, the performance of EMD in SIQAD image database is excellent.
The performance of EMD in SCID image database is also comparable with the other methods. For GN, JPEG, and J2K distortion types, EMD's SC and PC results are among the best three. For the remaining distortion types, the results of EMD are close to the results of the best performing methods for each criterion. From the results of the weighted averages, EMD and IWSSIM are the best two performing methods. The differences between EMD and IWSSIM are minor. The results in Table 3 and Table 4 show that EMD performs well for assessing the quality of SCIs.
For LIVE image databases, VIF performs the best. It is reasonable as VIF is designed for NIs. Meanwhile, EMD has competitive SC results for JP2K, JPEG, AWN and GB. The PC results of EMD of each distortion types are close to the best performing methods. The differences never exceed 0.03. These differences are minor as compared to the remaining QA methods. In terms of the weighted averages, EMD and VID perform similarly with minor differences. They are the two methods with the best weighted averages. The performance of EMD is competitive for the LIVE image database. Thus, its performance for assessment of the quality of NIs is competitive too.
Overall, the performances of EMD based on the distortion types are competitive compared to the other methods. The generality of EMD is also evident as it has excellent weighted  averages for all three image databases. Other methods such as IWSSIM and VIF only attain the best results for SCID and LIVE image databases, respectively. Their performances in the other two databases are not competitive enough as compared to EMD. From the results of Table 3, Table 4 and Table 5, the generality of EMD in terms of distortion types manifests.

Statistical results
F-statistic is utilized for comparing the performances of the QA methods statistically. The variance of objective and subjective scores from different methods is compared to each other. It is assumed that the residual differences between the objec-    tive and subjective scores follow Gaussianity. Table 6 shows the overall statistical results of all methods in all databases. For each column, the first, second and third sub-columns refer to the SIQAD, SCID and LIVE image databases, respectively. The symbol '-' denotes the method of the row is statistically insignificant as compared to the method of the column. On the other hand, numbers one or zero show that the method of the row is statistically better or worse as compared to the method of the column. Different from the results in Section 3.4, EMD and GFM have similar statistical performance. They are the best two methods. For SCID image database, GFM outperforms SVQI while EMD is statistically insignificant compared with SVQI. Mean-while, EMD outperforms GFM in LIVE image database. It outperforms FSIM while GFM is statistically insignificant to FSIM. Both methods have similar statistical performances for LIVE image database. They are better than the rest of the SCIs and NIs QA methods in overall except GMSD and VIF. The results in Table 6 also shows the excellent generality of EMD in assessing the quality of SCIs and NIs. EMD has good results in all image databases while the other methods only achieve good results in one or two image databases.
Following the pattern in Section 3.4, the statistical results of the QA methods according to the distortion types are also presented. The results are shown in Table 7, Table 8 and  Table 9. From the results in Table 7, EMD performs the best. It outperforms the other methods for nine times and there is no other method outperform EMD. Thus, EMD has the best statistical significance in SIQAD image database. IWSSIM also has similar performances compared to EMD. It outperforms other methods for nine times. However, IWSSIM mainly defeats other methods for CC distorted images. This result shows that the generality of EMD is better as the statistical performance of EMD is more consistent across different distortion types.
Similar to the results in Table 4, the statistical performance of EMD is the best in SCID image database. It outperforms other methods statistically 39 times and is only statistically inferior to IWSSIM for GB distortion. The statistical results of EMD in SCID are far better than the remaining methods. EMD's performance is also consistent for all distortion types in SCID image database.
For LIVE image database, there is no doubt that VIF outperforms the others without statistically inferior to other methods. EMD's performance is also competitive. It outperforms other methods 13 times and inferior to some of them six times. These results are different from those shown in Section 3.4. VIF is better than the other methods. EMD is also competitive with other methods other than VIF.
The statistical results presented in this section show that EMD has good generality statistically. It can assess the quality of SCIs and NIs overall and in terms of distortion types with good results. It has the best results for SCIs. For NIs, it is only inferior to VIF. However, VIF has poorer performance for SCIs image databases. EMD's statistical generality is evident with consistent results across different distortion types of each database as shown in Table 7 to Table 9.

3.6
Results for different kernels The EMD results with different filter kernels are presented in this section. In the literature, existing QA methods utilize different kernels to obtain edge information. In order to study the performance of different kernels, the results of EMD with different kernels are compared and discussed. All kernels are utilized in EMD with the same size as the Prewitt kernel mentioned in Section 2. For LG regions, the kernel of the size of 5 × 5 is used. Meanwhile, the kernel of the size 3 × 3 is applied to HG regions. The results of EMD utilizing other kernels are tabulated in Table 10. The best result is highlighted for each criterion. For brevity, only the overall results for each database are shown. The results of the methods across the distortion types are very similar to those of the overall results.
The results in Table 10 show that EMD with Prewitt kernel performs the best for all databases. Similar results are obtained for the weighted average results. The results prove the superior performance and generality of EMD with Prewitt kernel. The results of EMD with Scharr and Sobel kernels are close to EMD with Prewitt kernel for SCID and LIVE image databases. For SIQAD databases, EMD with Scharr and Sobel kernels have much lower performance. The condition is reversed in

Results for different kernel sizes
The utilization of a larger kernel size for HG regions as elaborated in Section 2 need to be justified. To this aim, the performance of the EMD with different kernel sizes of HG regions is presented and discussed. The results are tabulated in Table 11. For brevity, only the overall results of SC are shown. The results of PC and RE are similar to the results of SC. The results based LG regions, the kernel size is fixed with 3 × 3 as mentioned in Section 2. For the HG regions, the kernel sizes of 3 × 3, 5 × 5 and 7 × 7 are utilized. From the results in Table 11, if smaller kernel size is utilized, the performance of EMD drops for the case of SCIs. When the kernel size of 3 × 3 is utilized for HG regions, the performance of SIQAD and SCID image databases drops significantly especially for SCID image database. This result is reasonable as there are more high gradient regions in SCIs. On the other hand, the performance of EMD with 3 × 3 kernel size in NIs is similar to the EMD with a 5 × 5 kernel size. If a kernel size of 7 × 7 is utilized for HG regions, the performance of EMD in LIVE remains almost the same. For SIQAD image database, the performance drops to 0.869 while the SC of SCID image database increases to 0.920. Thus, the kernel size of HG regions should not be too large. For the SCID image databases, the image sets have larger edge magnitude as compared to the image sets in SIQAD image database. Therefore, the performance of EMD with 7 × 7 kernel size improves for SCID image database while its performance fades for SIQAD image database.
On the kernel sizes used in HG regions, the size should be larger than the one used in theLG regions. However, the increase in size should depend on the edge magnitude of the target image. The kernel size of 5 × 5 is chosen as it has the best results.

CONCLUSION
In a nutshell, all of the contributions of this paper are deliberated and justified with experiment results. A generalized QA method, EMD, is proposed for assessing the quality of both SCIs and NIs. EMD incorporates edge magnitudes and edge directions. The edge information is computed through Prewitt kernels. For the similarity comparison, edge magnitude is compared directly while the comparison of edge directions is the dot product of edge vectors. The experiment results show that EMD performs well in both SCIs and NIs database. EMD has the best results for SCIs and performs similarly to VIF for NIs. The results show the superiority of EMD over other SCIs and NIs QA methods. The generality of EMD is also shown with good average results for different image and distortion types. The choice of Prewitt filters for EMD is also verified compared to Robert, Sobel and Scharr kernels. The utilization of a larger kernel size for HG regions is also justified. The increase in kernel size of HG regions can improve the assessment results, but the increase depends on the edge magnitude of the image. In the future, EMD could be enhanced to deal with colour distortions and further extended for video QA tasks.