Alert button
Picture for Haisheng Fu

Haisheng Fu

Alert button

Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation

Sep 05, 2023
Haisheng Fu, Feng Liang, Jie Liang, Yongqiang Wang, Guohe Zhang, Jingning Han

Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we introduce four techniques to balance the trade-off between the complexity and performance. We are the first to introduce deformable convolutional module in compression framework, which can remove more redundancies in the input image, thereby enhancing compression performance. Second, we design a checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop an improved three-step knowledge distillation and training scheme to achieve different trade-offs between the complexity and the performance of the decoder network, which transfers both the final and intermediate results of the teacher network to the student network to help its training. Fourth, we introduce $L_{1}$ regularization to make the numerical values of the latent representation more sparse. Then we only encode non-zero channels in the encoding and decoding process, which can greatly reduce the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also $2.3 \%$ higher. Our method outperforms the traditional approach in H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.

* Submitted to Trans. Journal 
Viaarxiv icon

Enhanced Residual SwinV2 Transformer for Learned Image Compression

Aug 23, 2023
Yongqiang Wang, Feng Liang, Haisheng Fu, Jie Liang, Haipeng Qin, Junzhe Liang

Recently, the deep learning technology has been successfully applied in the field of image compression, leading to superior rate-distortion performance. However, a challenge of many learning-based approaches is that they often achieve better performance via sacrificing complexity, which making practical deployment difficult. To alleviate this issue, in this paper, we propose an effective and efficient learned image compression framework based on an enhanced residual Swinv2 transformer. To enhance the nonlinear representation of images in our framework, we use a feature enhancement module that consists of three consecutive convolutional layers. In the subsequent coding and hyper coding steps, we utilize a SwinV2 transformer-based attention mechanism to process the input image. The SwinV2 model can help to reduce model complexity while maintaining high performance. Experimental results show that the proposed method achieves comparable performance compared to some recent learned image compression methods on Kodak and Tecnick datasets, and outperforms some traditional codecs including VVC. In particular, our method achieves comparable results while reducing model complexity by 56% compared to these recent methods.

Viaarxiv icon

ROI-based Deep Image Compression with Swin Transformers

May 12, 2023
Binglin Li, Jie Liang, Haisheng Fu, Jingning Han

Figure 1 for ROI-based Deep Image Compression with Swin Transformers
Figure 2 for ROI-based Deep Image Compression with Swin Transformers
Figure 3 for ROI-based Deep Image Compression with Swin Transformers
Figure 4 for ROI-based Deep Image Compression with Swin Transformers

Encoding the Region Of Interest (ROI) with better quality than the background has many applications including video conferencing systems, video surveillance and object-oriented vision tasks. In this paper, we propose a ROI-based image compression framework with Swin transformers as main building blocks for the autoencoder network. The binary ROI mask is integrated into different layers of the network to provide spatial information guidance. Based on the ROI mask, we can control the relative importance of the ROI and non-ROI by modifying the corresponding Lagrange multiplier $ \lambda $ for different regions. Experimental results show our model achieves higher ROI PSNR than other methods and modest average PSNR for human evaluation. When tested on models pre-trained with original images, it has superior object detection and instance segmentation performance on the COCO validation dataset.

* This paper has been accepted by ICASSP 2023 
Viaarxiv icon

Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation

Sep 07, 2022
Haisheng Fu, Feng Liang

Figure 1 for Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation
Figure 2 for Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation
Figure 3 for Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation
Figure 4 for Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation

The application of the context-adaptive entropy model significantly improves the rate-distortion (R-D) performance, in which hyperpriors and autoregressive models are jointly utilized to effectively capture the spatial redundancy of the latent representations. However, the latent representations still contain some spatial correlations. In addition, these methods based on the context-adaptive entropy model cannot be accelerated in the decoding process by parallel computing devices, e.g. FPGA or GPU. To alleviate these limitations, we propose a learned multi-resolution image compression framework, which exploits the recently developed octave convolutions to factorize the latent representations into the high-resolution (HR) and low-resolution (LR) parts, similar to wavelet transform, which further improves the R-D performance. To speed up the decoding, our scheme does not use context-adaptive entropy model. Instead, we exploit an additional hyper layer including hyper encoder and hyper decoder to further remove the spatial redundancy of the latent representation. Moreover, the cross-resolution parameter estimation (CRPE) is introduced into the proposed framework to enhance the flow of information and further improve the rate-distortion performance. An additional information-fidelity loss is proposed to the total loss function to adjust the contribution of the LR part to the final bit stream. Experimental results show that our method separately reduces the decoding time by approximately 73.35 % and 93.44 % compared with that of state-of-the-art learned image compression methods, and the R-D performance is still better than H.266/VVC(4:2:0) and some learning-based methods on both PSNR and MS-SSIM metrics across a wide bit rates.

* Accepted by signal processing 
Viaarxiv icon

Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Map, and Post-Quantization Filtering

Jun 21, 2022
Haisheng Fu, Feng Liang, Jie Liang, Binglin Li, Guohe Zhang, Jingning Han

Figure 1 for Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Map, and Post-Quantization Filtering
Figure 2 for Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Map, and Post-Quantization Filtering
Figure 3 for Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Map, and Post-Quantization Filtering
Figure 4 for Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Map, and Post-Quantization Filtering

Recently, deep learning-based image compression has made signifcant progresses, and has achieved better ratedistortion (R-D) performance than the latest traditional method, H.266/VVC, in both subjective metric and the more challenging objective metric. However, a major problem is that many leading learned schemes cannot maintain a good trade-off between performance and complexity. In this paper, we propose an effcient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive feld and is easier to obtain global information. It can further capture and reduce the spatial correlation of the latent representations. Second, a more advanced importance map network is introduced to adaptively allocate bits to different regions of the image. Third, we apply a 2D post-quantization flter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) flter in video coding. Moreover, We fnd that the complexity of encoder and decoder have different effects on image compression performance. Based on this observation, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only needs one stage of MSRB to yield satisfactory reconstruction, thereby reducing the decoding complexity without sacrifcing performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by less than 1% on both Kodak and Tecnick datasets, which is still better than H.266/VVC(4:4:4) and other recent learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.

* IEEE TRANSACTIONS ON MULTIMEDIA 
Viaarxiv icon

Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Jul 18, 2021
Haisheng Fu, Feng Liang, Jianping Lin, Bing Li, Mohammad Akbari, Jie Liang, Guohe Zhang, Dong Liu, Chengjie Tu, Jingning Han

Figure 1 for Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules
Figure 2 for Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules
Figure 3 for Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules
Figure 4 for Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression frameworks are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions of one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms all the state-of-the-art learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM. The project page is at \url{https://github.com/fengyurenpingsheng/Learned-image-compression-with-GLLMM}

* Submitted to IEEE Transactions On Image Processing 
Viaarxiv icon

Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs

Jul 15, 2019
Haisheng Fu, Feng Liang, Bo Lei, Nai Bian, Qian zhang, Mohammad Akbari, Jie Liang, Chengjie Tu

Figure 1 for Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Figure 2 for Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Figure 3 for Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Figure 4 for Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs

Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain.

* Submitted to Signal Processing: Image Communication 
Viaarxiv icon

A Deep Image Compression Framework for Face Recognition

Jul 03, 2019
Nai Bian, Feng Liang, Haisheng Fu, Bo Lei

Figure 1 for A Deep Image Compression Framework for Face Recognition
Figure 2 for A Deep Image Compression Framework for Face Recognition
Figure 3 for A Deep Image Compression Framework for Face Recognition
Figure 4 for A Deep Image Compression Framework for Face Recognition

Face recognition technology has advanced rapidly and has been widely used in various applications. Due to the extremely huge amount of data of face images and the large computing resources required correspondingly in large-scale face recognition tasks, there is a requirement for a face image compression approach that is highly suitable for face recognition tasks. In this paper, we propose a deep convolutional autoencoder compression network for face recognition tasks. In the compression process, deep features are extracted from the original image by the convolutional neural networks to produce a compact representation of the original image, which is then encoded and saved by existing codec such as PNG. This compact representation is utilized by the reconstruction network to generate a reconstructed image of the original one. In order to improve the face recognition accuracy when the compression framework is used in a face recognition system, we combine this compression framework with a existing face recognition network for joint optimization. We test the proposed scheme and find that after joint training, the Labeled Faces in the Wild (LFW) dataset compressed by our compression framework has higher face verification accuracy than that compressed by JPEG2000, and is much higher than that compressed by JPEG.

Viaarxiv icon