1. Introduction
Computer-based automatic image quality assessment (IQA) has been sought after for decades because numerous image and video applications need this assessment to automate their quality maintenance. To date, IQA research has been significantly advanced, however, this is still an active area of research to bring the methods closer to human-level ability. In the literature, there are three principle IQA approaches. No-reference image quality assessment (NR-IQA) uses a single distorted image without any reference image, whereas in reduced-reference IQA (RR-IQA), partial information of a reference image is given. The third category is full-reference IQA (FR-IQA) where the complete reference image is given along with the distorted one. In this paper, we deal with FR-IQA.
Table 1 presents a brief survey on several state-of-the-art IQA approaches, which we compared in this paper. We made a separate column for the pooling strategy because recent IQA research tends to combine multiple features where pooling plays an important role. Early pixel-based, faster IQA methods such as mean squared error (MSE) and peak signal-to-noise ratio (PSNR) consider neither the human visual system (HVS) nor any aspects of human perception. Thus, those approaches fail to achieve good correlation with human assessment [
1,
2]. Two images with the same PSNR or MSE may be perceived in totally different ways by a human observer. However, humans are the ultimate receiver of images; as a result, the search for methods that can achieve a closer correlation with humans is ongoing. Wang et al., in their revolutionary work on the structural similarity index or SSIM [
3], argued that human visual perception is highly sensitive to structural information. The SSIM index incorporates luminance, contrast, and structural comparison information and achieves a very good correlation with the mean opinion scores (MOS) of human observers. Inspired by the success of SSIM, several extended versions, such as the multi-scale structural similarity for image quality assessment (MS-SSIM) [
4] and Information content weighting for perceptual image quality assessment (IW-SSIM) [
5], were proposed by the same research group. IW-SSIM utilizes an image pyramid to decompose the original and distorted images into versions of varying scales, and then computes the information content from the images. Finally, it finds the quality score using the information content as a weighting function.
Based on shared information between the reference and distorted images, Sheikh et al. proposed the information fidelity criteria (IFC) [
6] and the visual information fidelity (VIF) [
7]. The most apparent distortion (MAD) approach [
8] separates images based on the distortion and applies either a detection-based strategy or an appearance-based strategy. Some of the methods, such as the noise quality measure (NQM) [
9] and the visual signal-to-noise ratio (VSNR) [
10], take into account the HVS by incorporating interactions among different visual signals. In contrast, other approaches, including the popular feature similarity index or FSIM [
11], emphasize phase congruency [
12,
13,
14]. FSIM uses the image gradient as a secondary feature and local quality maps are weighted by phase congruency to obtain the final score. The image gradient has been used effectively in a number of other works [
15,
16]. Xue et al., in their gradient magnitude standard deviation (GMSD) [
17], used the gradient magnitude with a different pooling strategy, by applying the standard deviation, and Alaei et al. adopted a similar approach for assessing document images [
18]. Both examples prove the effectiveness of standard deviation pooling, however, the authors of the GMSD approach showed that standard deviation (SD) pooling is not effective for all types of methods. Wang et al. proposed the multi-scale contrast similarity deviation (MCSD) metod [
19], which can be termed as a continuation of SSIM and MS-SSIM, since it also uses the root mean square (RMS) contrast similarity; however, they employed standard deviation pooling for the final score.
Meanwhile, inspired by vision-related psychological research, visual saliency (VS)-based IQA methods [
20,
21], which utilize different kinds of visual saliencies [
22,
23,
24], have attracted researchers’ attention. In the visual saliency index (VSI) method [
23], VS is used as both a quality map and the weighting function at the pooling stage. The spectral residual similarity index (SR-SIM) [
24] uses the spectral residual saliency, which makes the approach very fast while maintaining a competitive correlation with the mean opinion score. Combining VS with other features has also become popular [
25,
26]. Li et al. proposed an approach that combines VS and FSIM while, recently, Jia et al. used contrast and spectral residual saliency as well as summation-based SD pooling.
In the context of the HVS, center bias in early eye movements is an established fact in psychological vision research [
27,
28,
29,
30]. Bindemann found that eye movement is biased not only to the scene center but also to the screen center [
31]. As a result, if a scene appears at the center of the screen, it will receive the most attention. For example, in
Figure 1, the human eye will first move to the Block05 region and if that part has visually important information then it will attract even more attention. As a result, people will be more sensitive to the distortions in this region. To the best of our knowledge, there is no research in IQA considering this center bias for quality assessment.
In this paper, we propose a new method for IQA which accounts for the center emphasis in HVS. In the proposed method, we first obtain both the contrast and VS similarity maps for the entire image. To give center emphasis, we find the VS similarity map of the mid-region and apply element-wise multiplication in the mid-part to raise the similarity deviation there. However, for the contrast similarity, we apply element-wise squaring in the center part. Contrast is a local quality map, so we do not calculate the contrast of the mid-area separately. On the other hand, VS is a global quality map, and thus it is calculated differently in the mid-region. The final score is obtained by performing weighted summation of the standard deviations on both of the similarity maps; further details with mathematical equations are given in
Section 3.
We evaluated our proposed method on three popular benchmark databases for IQA research and compared it with 13 other state-of-the-art approaches. The results in terms of the correlations with human evaluated scores show that the method proposed by us outperforms the other approaches, with a reasonable amount of processing time.
This paper is organized as follows.
Section 2 describes some underlying theories and related techniques.
Section 3 explains the proposed center-emphasized assessment approach, and the results with relevant discussions are presented in
Section 4. Finally, the paper is concluded in
Section 5.
3. Proposed Center-Emphasized Quality Assessment
The general flow diagram of our proposed method is presented in
Figure 2. At first, the center parts of both the reference and distorted images are extracted. To do this, we split the image in
image blocks as shown in
Figure 1, and the fifth block, which resides in the middle both horizontally and vertically, is taken as the center area. If the original image dimension is
, then the corresponding dimension for the center block becomes
, where:
The center block is defined as a rectangular area identified by two corner points
, where:
First, the saliency similarity maps for the full images and middle images are found using Equations (
1)–(
6) and are denoted as
and
, respectively. Simultaneously, the contrast similarity map for the full-size,
, is also obtained. As discussed before, we do not derive the CS map for middle images.
Then, we increase the sensitivity of the center area within both of the maps. Let
and
be the center areas of
and
, respectively. The updated middle parts will be determined as follows:
where ⊙ is the element-wise multiplication.
With the updated middle portion, we obtain the finalized maps
and
, and using Equation (
10), we calculate the final quality score of the proposed method
as:
4. Results and Analysis
Experiments were carried out on three popular benchmark databases for IQA research—TID2008 [
36], CSIQ [
37] and LIVE [
38]. Our approach was compared with 13 other state-of-the-art approaches as listed in
Table 1. Basic information about the databases is given in
Table 2 and the distortion information is recorded in
Table 3.
For performance comparison, we use four commonly adopted metrics—Spearman’s rank-order correlation coefficient (SROCC), Kendall’s rank-order correlation coefficient (KROCC), Pearson’s linear correlation coefficient (PLCC), and the root mean square error (RMSE)—which we defined in
Section 2.4.
Table 4 compares the four metrics among the different IQA models, for all of the three databases. The top three values for each metric are typed in boldfaced and light-gray shaded; the top value is colored blue; the second highest value is colored red, and the third highest value is colored black. However, in the case of RMSE, coloring is done in a reverse way, i.e., the lowest value is colored in blue and so on, since a lower RMSE implies a better method. We see that, for the biggest database, TID2008, our proposed method outperforms all other methods in all metrics. For the other two databases, it achieves competitive performance. We calculated the weighted averages of the SROCC, KROCC, PLCC, and RMSE using the number of distorted images to find the overall performance, as proposed in Reference [
5]. It can be noticed that, compared to VSI and VSP, our approach shown better prediction accuracy with (1.09%, 0.3%)-point, (2.44%, 0.39%)-point and (2.19%, 0.22%)-point higher overall SROC, KROC and PLCC values, respectively. The overall ranking based on performance is shown in
Table 5.
Table 6 compares the SROCC performance for all distortion types; please refer to
Table 3 for a description of the abbreviations. We see that different methods perform better for different distortions and performance even varies between databases. This is the case because images are not affected equally by a specific type of distortion—it depends on the color, salient regions, and perhaps a combination of many other factors. Still, distortion-wise comparison gives us a good understanding of whether an IQA method is biased to some noise type or not. It can be seen that the proposed CEQI performs consistently well for all types of distortion; it is not too biased to any specific type of distortion, while retaining an average performance within the top 3 methods.
Figure 3 shows scatter plots of the predicted scores for different IQA approaches with the MOS/DMOS values on the TID2008 database. These results show that CEQI’s prediction is consistent compared to other methods, while providing a better correlation. We do not include PSNR because its predictions are very inconsistent. NQM is also discarded for the same reason, although its performance is not as inconsistent as PSNR.
Although the prime consideration of an IQA model is the performance of its prediction, having a low computational cost is also a desirable feature, especially for a real-time system. We evaluated the various IQA models with MATLAB R2017b using a computer equipped with an Intel(R) Core(TM) i5-4670 CPU with a 3.40 GHz processor and 16 GB of RAM. The MATLAB codes provided by the authors were used and elapsed time was recorded using the traditional
tic-toc function. The results of these tests are shown in
Table 7. As expected, PSNR has the lowest computation time. Surprisingly, the gradient magnitude similarity division model can process 263.05 images per second with satisfactory performance (rank 4 as shown in
Table 5). VIF shows very good performance for the LIVE database where it is the best-performing IQA, but it can only process 1.79 images per second on average, which makes it inappropriate for real-time systems or systems with low processing capability. On the other hand, CEQI takes 15.25 ms to process an image, with the capability of processing 65.51 images per second. This frame rate meets the need for almost all kinds of real-time systems.