Alert button
Picture for Haoming Cai

Haoming Cai

Alert button

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

May 24, 2023
Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, Jing Xiao, Yinqiang Zheng, Yujiu Yang

Figure 1 for Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
Figure 2 for Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
Figure 3 for Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
Figure 4 for Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and detailed information to generate multiple pseudo viewport sequences from a given starting point. Additionally, we design a Multi-scale Feature Aggregation (MFA) module with Distortion-aware Block (DAB) to fuse distorted and semantic features of each viewport. We also devise TMM to learn the viewport transition in the temporal domain. Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets.

Viaarxiv icon

Efficient Image Super-Resolution using Vast-Receptive-Field Attention

Oct 12, 2022
Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong

Figure 1 for Efficient Image Super-Resolution using Vast-Receptive-Field Attention
Figure 2 for Efficient Image Super-Resolution using Vast-Receptive-Field Attention
Figure 3 for Efficient Image Super-Resolution using Vast-Receptive-Field Attention
Figure 4 for Efficient Image Super-Resolution using Vast-Receptive-Field Attention

The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of the attention branch, (2) replacing large dense convolution kernels with depth-wise separable convolutions, and (3) introducing pixel normalization. These approaches paint a clear evolutionary roadmap for the design of attention mechanisms. Based on these observations, we propose VapSR, the VAst-receptive-field Pixel attention network. Experiments demonstrate the superior performance of VapSR. VapSR outperforms the present lightweight networks with even fewer parameters. And the light version of VapSR can use only 21.68% and 28.18% parameters of IMDB and RFDN to achieve similar performances to those networks. The code and models are available at url{https://github.com/zhoumumu/VapSR.

Viaarxiv icon

Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images

Oct 09, 2022
Jinjin Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan

Figure 1 for Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images
Figure 2 for Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images
Figure 3 for Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images
Figure 4 for Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images

Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods.

* This article has been accepted by ECCV2022 
Viaarxiv icon

NTIRE 2022 Challenge on Perceptual Image Quality Assessment

Jun 23, 2022
Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Radu Timofte

Figure 1 for NTIRE 2022 Challenge on Perceptual Image Quality Assessment
Figure 2 for NTIRE 2022 Challenge on Perceptual Image Quality Assessment
Figure 3 for NTIRE 2022 Challenge on Perceptual Image Quality Assessment
Figure 4 for NTIRE 2022 Challenge on Perceptual Image Quality Assessment

This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics from traditional distortions and are included in the PIPAL dataset used in this challenge. This challenge is divided into two tracks, a full-reference IQA track similar to the previous NTIRE IQA challenge and a new track that focuses on the no-reference IQA methods. The challenge has 192 and 179 registered participants for two tracks. In the final testing stage, 7 and 8 participating teams submitted their models and fact sheets. Almost all of them have achieved better results than existing IQA methods, and the winning method can demonstrate state-of-the-art performance.

* This report has been published in CVPR 2022 NTIRE workshop. arXiv admin note: text overlap with arXiv:2105.03072 
Viaarxiv icon

Blueprint Separable Residual Network for Efficient Image Super-Resolution

May 12, 2022
Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Jinjin Gu, Yu Qiao, Chao Dong

Figure 1 for Blueprint Separable Residual Network for Efficient Image Super-Resolution
Figure 2 for Blueprint Separable Residual Network for Efficient Image Super-Resolution
Figure 3 for Blueprint Separable Residual Network for Efficient Image Super-Resolution
Figure 4 for Blueprint Separable Residual Network for Efficient Image Super-Resolution

Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there is still redundancy in the convolution operation. In this paper, we propose Blueprint Separable Residual Network (BSRN) containing two efficient designs. One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation. The other is to enhance the model ability by introducing more effective attention modules. The experimental results show that BSRN achieves state-of-the-art performance among existing efficient SR methods. Moreover, a smaller variant of our model BSRN-S won the first place in model complexity track of NTIRE 2022 Efficient SR Challenge. The code is available at https://github.com/xiaom233/BSRN.

* Accepted to CVPR Workshops 
Viaarxiv icon

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

May 11, 2022
Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, Yufei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

Figure 1 for NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Figure 2 for NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Figure 3 for NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Figure 4 for NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.

* Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR 
Viaarxiv icon

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

May 11, 2021
Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

Figure 1 for NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Figure 2 for NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Figure 3 for NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Figure 4 for NTIRE 2021 Challenge on Perceptual Image Quality Assessment

This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.

Viaarxiv icon

Toward Interactive Modulation for Photo-Realistic Image Restoration

May 07, 2021
Haoming Cai, Jingwen He, Qiao Yu, Chao Dong

Figure 1 for Toward Interactive Modulation for Photo-Realistic Image Restoration
Figure 2 for Toward Interactive Modulation for Photo-Realistic Image Restoration
Figure 3 for Toward Interactive Modulation for Photo-Realistic Image Restoration
Figure 4 for Toward Interactive Modulation for Photo-Realistic Image Restoration

Modulating image restoration level aims to generate a restored image by altering a factor that represents the restoration strength. Previous works mainly focused on optimizing the mean squared reconstruction error, which brings high reconstruction accuracy but lacks finer texture details. This paper presents a Controllable Unet Generative Adversarial Network (CUGAN) to generate high-frequency textures in the modulation tasks. CUGAN consists of two modules -- base networks and condition networks. The base networks comprise a generator and a discriminator. In the generator, we realize the interactive control of restoration levels by tuning the weights of different features from different scales in the Unet architecture. Moreover, we adaptively modulate the intermediate features in the discriminator according to the severity of degradations. The condition networks accept the condition vector (encoded degradation information) as input, then generate modulation parameters for both the generator and the discriminator. During testing, users can control the output effects by tweaking the condition vector. We also provide a smooth transition between GAN and MSE effects by a simple transition method. Extensive experiments demonstrate that the proposed CUGAN achieves excellent performance on image restoration modulation tasks.

Viaarxiv icon

Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric

Nov 30, 2020
Jinjin Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Figure 1 for Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric
Figure 2 for Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric
Figure 3 for Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric
Figure 4 for Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric

Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. We present two questions: Can existing IQA methods objectively evaluate recent IR algorithms? With the focus on beating current benchmarks, are we getting better IR algorithms? To answer the questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing ALgorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based IR algorithms, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable Elo system. Based on PIPAL, we present new benchmarks for both IQA and SR methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we shed light on how to improve the IQA performance on GAN-based distortion. Inspired by the find that the existing IQA methods have an unsatisfactory performance on the GAN-based distortion partially because of their low tolerance to spatial misalignment, we propose to improve the performance of an IQA network on GAN-based distortion by explicitly considering this misalignment. We propose the Space Warping Difference Network, which includes the novel l_2 pooling layers and Space Warping Difference layers. Experiments demonstrate the effectiveness of the proposed method.

* arXiv admin note: substantial text overlap with arXiv:2007.12142 
Viaarxiv icon