Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuanchao Bai

Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Sep 23, 2021

Jialei Xu, Yuanchao Bai, Xianming Liu, Junjun Jiang, Xiangyang Ji

Figure 1 for Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Figure 2 for Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Figure 3 for Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Figure 4 for Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Abstract:Depth estimation from a single image is an active research topic in computer vision. The most accurate approaches are based on fully supervised learning models, which rely on a large amount of dense and high-resolution (HR) ground-truth depth maps. However, in practice, color images are usually captured with much higher resolution than depth maps, leading to the resolution-mismatched effect. In this paper, we propose a novel weakly-supervised framework to train a monocular depth estimation network to generate HR depth maps with resolution-mismatched supervision, i.e., the inputs are HR color images and the ground-truth are low-resolution (LR) depth maps. The proposed weakly supervised framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation. Specifically, for the monocular depth estimation network the input color image is first downsampled to obtain its LR version with the same resolution as the ground-truth depth. Then, both HR and LR color images are fed into the proposed monocular depth estimation network to obtain the corresponding estimated depth maps. We introduce three losses to train the network: 1) reconstruction loss between the estimated LR depth and the ground-truth LR depth; 2) reconstruction loss between the downsampled estimated HR depth and the ground-truth LR depth; 3) consistency loss between the estimated LR depth and the downsampled estimated HR depth. In addition, we design a depth reconstruction network from depth to depth. Through distillation loss, features between two networks maintain the structural consistency in affinity space, and finally improving the estimation network performance. Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes, and is competitive or even better compared to supervised ones.

Via

Access Paper or Ask Questions

Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression

Mar 31, 2021

Yuanchao Bai, Xianming Liu, Wangmeng Zuo, Yaowei Wang, Xiangyang Ji

$Figure 1 for Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression$

$Figure 2 for Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression$

$Figure 3 for Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression$

$Figure 4 for Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression$

Abstract:We propose a novel joint lossy image and residual compression framework for learning $\ell_\infty$-constrained near-lossless image compression. Specifically, we obtain a lossy reconstruction of the raw image through lossy image compression and uniformly quantize the corresponding residual to satisfy a given tight $\ell_\infty$ error bound. Suppose that the error bound is zero, i.e., lossless image compression, we formulate the joint optimization problem of compressing both the lossy image and the original residual in terms of variational auto-encoders and solve it with end-to-end training. To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks. We further correct the bias of the derived probability model caused by the context mismatch between training and inference. Finally, the quantized residual is encoded according to the bias-corrected probability model and is concatenated with the bitstream of the compressed lossy image. Experimental results demonstrate that our near-lossless codec achieves the state-of-the-art performance for lossless and near-lossless image compression, and achieves competitive PSNR while much smaller $\ell_\infty$ error compared with lossy image codecs at high bit rates.

* Accepted by CVPR 2021; Code: https://github.com/BYchao100/Scalable-Near-lossless-Image-Compression

Via

Access Paper or Ask Questions

FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

Dec 05, 2019

Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, Huizhu Jia

Figure 1 for FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

Figure 2 for FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

Figure 3 for FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

Figure 4 for FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

Abstract:In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components: 1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers. The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23db to 36.39db on the SOTS indoor test dataset. Code has been made available at GitHub.

* Accepted by AAAI2020

Via

Access Paper or Ask Questions

Single Image Blind Deblurring Using Multi-Scale Latent Structure Prior

Jun 11, 2019

Yuanchao Bai, Huizhu Jia, Ming Jiang, Xianming Liu, Xiaodong Xie, Wen Gao

Figure 1 for Single Image Blind Deblurring Using Multi-Scale Latent Structure Prior

Figure 2 for Single Image Blind Deblurring Using Multi-Scale Latent Structure Prior

Figure 3 for Single Image Blind Deblurring Using Multi-Scale Latent Structure Prior

Figure 4 for Single Image Blind Deblurring Using Multi-Scale Latent Structure Prior

Abstract:Blind image deblurring is a challenging problem in computer vision, which aims to restore both the blur kernel and the latent sharp image from only a blurry observation. Inspired by the prevalent self-example prior in image super-resolution, in this paper, we observe that a coarse enough image down-sampled from a blurry observation is approximately a low-resolution version of the latent sharp image. We prove this phenomenon theoretically and define the coarse enough image as a latent structure prior of the unknown sharp image. Starting from this prior, we propose to restore sharp images from the coarsest scale to the finest scale on a blurry image pyramid, and progressively update the prior image using the newly restored sharp image. These coarse-to-fine priors are referred to as \textit{Multi-Scale Latent Structures} (MSLS). Leveraging the MSLS prior, our algorithm comprises two phases: 1) we first preliminarily restore sharp images in the coarse scales; 2) we then apply a refinement process in the finest scale to obtain the final deblurred image. In each scale, to achieve lower computational complexity, we alternately perform a sharp image reconstruction with fast local self-example matching, an accelerated kernel estimation with error compensation, and a fast non-blind image deblurring, instead of computing any computationally expensive non-convex priors. We further extend the proposed algorithm to solve more challenging non-uniform blind image deblurring problem. Extensive experiments demonstrate that our algorithm achieves competitive results against the state-of-the-art methods with much faster running speed.

* To appear in IEEE Transactions on Circuits and Systems for Video Technology, 2019; Image downsampling makes a good prior for fast blind image deblurring

Via

Access Paper or Ask Questions

Graph-Based Blind Image Deblurring From a Single Photograph

Feb 22, 2018

Yuanchao Bai, Gene Cheung, Xianming Liu, Wen Gao

Figure 1 for Graph-Based Blind Image Deblurring From a Single Photograph

Figure 2 for Graph-Based Blind Image Deblurring From a Single Photograph

Figure 3 for Graph-Based Blind Image Deblurring From a Single Photograph

Figure 4 for Graph-Based Blind Image Deblurring From a Single Photograph

Abstract:Blind image deblurring, i.e., deblurring without knowledge of the blur kernel, is a highly ill-posed problem. The problem can be solved in two parts: i) estimate a blur kernel from the blurry image, and ii) given estimated blur kernel, de-convolve blurry input to restore the target image. In this paper, we propose a graph-based blind image deblurring algorithm by interpreting an image patch as a signal on a weighted graph. Specifically, we first argue that a skeleton image---a proxy that retains the strong gradients of the target but smooths out the details---can be used to accurately estimate the blur kernel and has a unique bi-modal edge weight distribution. Then, we design a reweighted graph total variation (RGTV) prior that can efficiently promote a bi-modal edge weight distribution given a blurry patch. Further, to analyze RGTV in the graph frequency domain, we introduce a new weight function to represent RGTV as a graph $l_1$-Laplacian regularizer. This leads to a graph spectral filtering interpretation of the prior with desirable properties, including robustness to noise and blur, strong piecewise smooth (PWS) filtering and sharpness promotion. Minimizing a blind image deblurring objective with RGTV results in a non-convex non-differentiable optimization problem. We leverage the new graph spectral interpretation for RGTV to design an efficient algorithm that solves for the skeleton image and the blur kernel alternately. Specifically for Gaussian blur, we propose a further speedup strategy for blind Gaussian deblurring using accelerated graph spectral filtering. Finally, with the computed blur kernel, recent non-blind image deblurring algorithms can be applied to restore the target image. Experimental results demonstrate that our algorithm successfully restores latent sharp images and outperforms state-of-the-art methods quantitatively and qualitatively.

Via

Access Paper or Ask Questions

Blind Image Deblurring via Reweighted Graph Total Variation

Dec 24, 2017

Yuanchao Bai, Gene Cheung, Xianming Liu, Wen Gao

Figure 1 for Blind Image Deblurring via Reweighted Graph Total Variation

Figure 2 for Blind Image Deblurring via Reweighted Graph Total Variation

Figure 3 for Blind Image Deblurring via Reweighted Graph Total Variation

Figure 4 for Blind Image Deblurring via Reweighted Graph Total Variation

* 5 pages, submitted to IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, April, 2018

Via

Access Paper or Ask Questions