Alert button
Picture for Xiaogang Xu

Xiaogang Xu

Alert button

High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation

Jul 31, 2023
Jiaqi Tang, Xiaogang Xu, Sixing Hu, Ying-Cong Chen

Due to limited camera capacities, digital images usually have a narrower dynamic illumination range than real-world scene radiance. To resolve this problem, High Dynamic Range (HDR) reconstruction is proposed to recover the dynamic range to better represent real-world scenes. However, due to different physical imaging parameters, the tone-mapping functions between images and real radiance are highly diverse, which makes HDR reconstruction extremely challenging. Existing solutions can not explicitly clarify a corresponding relationship between the tone-mapping function and the generated HDR image, but this relationship is vital when guiding the reconstruction of HDR images. To address this problem, we propose a method to explicitly estimate the tone mapping function and its corresponding HDR image in one network. Firstly, based on the characteristics of the tone mapping function, we construct a model by a polynomial to describe the trend of the tone curve. To fit this curve, we use a learnable network to estimate the coefficients of the polynomial. This curve will be automatically adjusted according to the tone space of the Low Dynamic Range (LDR) image, and reconstruct the real HDR image. Besides, since all current datasets do not provide the corresponding relationship between the tone mapping function and the LDR image, we construct a new dataset with both synthetic and real images. Extensive experiments show that our method generalizes well under different tone-mapping functions and achieves SOTA performance.

Viaarxiv icon

Lighting up NeRF via Unsupervised Decomposition and Enhancement

Jul 20, 2023
Haoyuan Wang, Xiaogang Xu, Ke Xu, Rynson WH. Lau

Figure 1 for Lighting up NeRF via Unsupervised Decomposition and Enhancement
Figure 2 for Lighting up NeRF via Unsupervised Decomposition and Enhancement
Figure 3 for Lighting up NeRF via Unsupervised Decomposition and Enhancement
Figure 4 for Lighting up NeRF via Unsupervised Decomposition and Enhancement

Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement methods with NeRF methods also does not work well due to the view inconsistency caused by the individual 2D enhancement process. In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. The core of our approach is a decomposition of radiance field learning, which allows us to enhance the illumination, reduce noise and correct the distorted colors jointly with the NeRF optimization process. Our method is able to produce novel view images with proper lighting and vivid colors and details, given a collection of camera-finished low dynamic range (8-bits/channel) images from a low-light scene. Experiments demonstrate that our method outperforms existing low-light enhancement methods and NeRF methods.

* ICCV 2023. Project website: https://whyy.site/paper/llnerf 
Viaarxiv icon

Low-Light Image Enhancement via Structure Modeling and Guidance

May 10, 2023
Xiaogang Xu, Ruixing Wang, Jiangbo Lu

Figure 1 for Low-Light Image Enhancement via Structure Modeling and Guidance
Figure 2 for Low-Light Image Enhancement via Structure Modeling and Guidance
Figure 3 for Low-Light Image Enhancement via Structure Modeling and Guidance
Figure 4 for Low-Light Image Enhancement via Structure Modeling and Guidance

This paper proposes a new framework for low-light image enhancement by simultaneously conducting the appearance as well as structure modeling. It employs the structural feature to guide the appearance enhancement, leading to sharp and realistic results. The structure modeling in our framework is implemented as the edge detection in low-light images. It is achieved with a modified generative model via designing a structure-aware feature extractor and generator. The detected edge maps can accurately emphasize the essential structural information, and the edge prediction is robust towards the noises in dark areas. Moreover, to improve the appearance modeling, which is implemented with a simple U-Net, a novel structure-guided enhancement module is proposed with structure-guided feature synthesis layers. The appearance modeling, edge detector, and enhancement module can be trained end-to-end. The experiments are conducted on representative datasets (sRGB and RAW domains), showing that our model consistently achieves SOTA performance on all datasets with the same architecture.

Viaarxiv icon

Leaf Cultivar Identification via Prototype-enhanced Learning

May 05, 2023
Yiyi Zhang, Zhiwen Ying, Ying Zheng, Cuiling Wu, Nannan Li, Jun Wang, Xianzhong Feng, Xiaogang Xu

Figure 1 for Leaf Cultivar Identification via Prototype-enhanced Learning
Figure 2 for Leaf Cultivar Identification via Prototype-enhanced Learning
Figure 3 for Leaf Cultivar Identification via Prototype-enhanced Learning
Figure 4 for Leaf Cultivar Identification via Prototype-enhanced Learning

Plant leaf identification is crucial for biodiversity protection and conservation and has gradually attracted the attention of academia in recent years. Due to the high similarity among different varieties, leaf cultivar recognition is also considered to be an ultra-fine-grained visual classification (UFGVC) task, which is facing a huge challenge. In practice, an instance may be related to multiple varieties to varying degrees, especially in the UFGVC datasets. However, deep learning methods trained on one-hot labels fail to reflect patterns shared across categories and thus perform poorly on this task. To address this issue, we generate soft targets integrated with inter-class similarity information. Specifically, we continuously update the prototypical features for each category and then capture the similarity scores between instances and prototypes accordingly. Original one-hot labels and the similarity scores are incorporated to yield enhanced labels. Prototype-enhanced soft labels not only contain original one-hot label information, but also introduce rich inter-category semantic association information, thus providing more effective supervision for deep model training. Extensive experimental results on public datasets show that our method can significantly improve the performance on the UFGVC task of leaf cultivar identification.

Viaarxiv icon

TriVol: Point Cloud Rendering via Triple Volumes

Mar 29, 2023
Tao Hu, Xiaogang Xu, Ruihang Chu, Jiaya Jia

Figure 1 for TriVol: Point Cloud Rendering via Triple Volumes
Figure 2 for TriVol: Point Cloud Rendering via Triple Volumes
Figure 3 for TriVol: Point Cloud Rendering via Triple Volumes
Figure 4 for TriVol: Point Cloud Rendering via Triple Volumes

Existing learning-based methods for point cloud rendering adopt various 3D representations and feature querying mechanisms to alleviate the sparsity problem of point clouds. However, artifacts still appear in rendered images, due to the challenges in extracting continuous and discriminative 3D features from point clouds. In this paper, we present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds. Our TriVol consists of triple slim volumes, each of which is encoded from the point cloud. TriVol has two advantages. First, it fuses respective fields at different scales and thus extracts local and non-local features for discriminative representation. Second, since the volume size is greatly reduced, our 3D decoder can be efficiently inferred, allowing us to increase the resolution of the 3D space to render more point details. Extensive experiments on different benchmarks with varying kinds of scenes/objects demonstrate our framework's effectiveness compared with current approaches. Moreover, our framework has excellent generalization ability to render a category of scenes/objects without fine-tuning.

Viaarxiv icon

Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

Mar 29, 2023
Tao Hu, Xiaogang Xu, Shu Liu, Jiaya Jia

Figure 1 for Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
Figure 2 for Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
Figure 3 for Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
Figure 4 for Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

Synthesizing photo-realistic images from a point cloud is challenging because of the sparsity of point cloud representation. Recent Neural Radiance Fields and extensions are proposed to synthesize realistic images from 2D input. In this paper, we present Point2Pix as a novel point renderer to link the 3D sparse point clouds with 2D dense image pixels. Taking advantage of the point cloud 3D prior and NeRF rendering pipeline, our method can synthesize high-quality images from colored point clouds, generally for novel indoor scenes. To improve the efficiency of ray sampling, we propose point-guided sampling, which focuses on valid samples. Also, we present Point Encoding to build Multi-scale Radiance Fields that provide discriminative 3D point features. Finally, we propose Fusion Encoding to efficiently synthesize high-quality images. Extensive experiments on the ScanNet and ArkitScenes datasets demonstrate the effectiveness and generalization.

Viaarxiv icon

Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition

Dec 19, 2022
Xin Yang, Xiaogang Xu, Yingcong Chen

Figure 1 for Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition
Figure 2 for Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition
Figure 3 for Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition
Figure 4 for Photo-Realistic Out-of-domain GAN inversion via Invertibility Decomposition

The fidelity of Generative Adversarial Networks (GAN) inversion is impeded by Out-Of-Domain (OOD) areas (e.g., background, accessories) in the image. Detecting the OOD areas beyond the generation ability of the pretrained model and blending these regions with the input image can enhance fidelity. The ``invertibility mask" figures out these OOD areas, and existing methods predict the mask with the reconstruction error. However, the estimated mask is usually inaccurate due to the influence of the reconstruction error in the In-Domain (ID) area. In this paper, we propose a novel framework that enhances the fidelity of human face inversion by designing a new module to decompose the input images to ID and OOD partitions with invertibility masks. Unlike previous works, our invertibility detector is simultaneously learned with a spatial alignment module. We iteratively align the generated features to the input geometry and reduce the reconstruction error in the ID regions. Thus, the OOD areas are more distinguishable and can be precisely predicted. Then, we improve the fidelity of our results by blending the OOD areas from the input image with the ID GAN inversion results. Our method produces photo-realistic results for real-world human face image inversion and manipulation. Extensive experiments demonstrate our method's superiority over existing methods in the quality of GAN inversion and attribute manipulation.

Viaarxiv icon

General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments

Dec 11, 2022
Xiaogang Xu, Hengshuang Zhao, Philip Torr, Jiaya Jia

Figure 1 for General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments
Figure 2 for General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments
Figure 3 for General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments
Figure 4 for General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments

Deep Neural Networks (DNNs) are vulnerable to the black-box adversarial attack that is highly transferable. This threat comes from the distribution gap between adversarial and clean samples in feature space of the target DNNs. In this paper, we use Deep Generative Networks (DGNs) with a novel training mechanism to eliminate the distribution gap. The trained DGNs align the distribution of adversarial samples with clean ones for the target DNNs by translating pixel values. Different from previous work, we propose a more effective pixel level training constraint to make this achievable, thus enhancing robustness on adversarial samples. Further, a class-aware feature-level constraint is formulated for integrated distribution alignment. Our approach is general and applicable to multiple tasks, including image classification, semantic segmentation, and object detection. We conduct extensive experiments on different datasets. Our strategy demonstrates its unique effectiveness and generality against black-box attacks.

Viaarxiv icon

Towards Local Underexposed Photo Enhancement

Aug 17, 2022
Yizhan Huang, Xiaogang Xu

Figure 1 for Towards Local Underexposed Photo Enhancement
Figure 2 for Towards Local Underexposed Photo Enhancement
Figure 3 for Towards Local Underexposed Photo Enhancement
Figure 4 for Towards Local Underexposed Photo Enhancement

Inspired by the ability of deep generative models to generate highly realistic images, much recent work has made progress in enhancing underexposed images globally. However, the local image enhancement approach has not been explored, although they are requisite in the real-world scenario, e.g., fixing local underexposure. In this work, we define a new task setting for underexposed image enhancement where users are able to control which region to be enlightened with an input mask. As indicated by the mask, an image can be divided into three areas, including Masked Area A, Transition Area B, and Unmasked Area C. As a result, Area A should be enlightened to the desired lighting, and there shall be a smooth transition (Area B) from the enlightened area (Area A) to the unchanged region (Area C). To finish this task, we propose two methods: Concatenate the mask as additional channels (MConcat), Mask-based Normlization (MNorm). While MConcat simply append the mask channels to the input images, MNorm can dynamically enhance the spatial-varying pixels, guaranteeing the enhanced images are consistent with the requirement indicated by the input mask. Moreover, MConcat serves as a play-and-plug module, and can be incorporated with existing networks, which globally enhance images, to achieve the local enhancement. And the overall network can be trained with three kinds of loss functions in Area A, Area B, and Area C, which are unified for various model structures. We perform extensive experiments on public datasets with various parametric approaches for low-light enhancement, %the Convolutional-Neutral-Network-based model and Transformer-based model, demonstrating the effectiveness of our methods.

Viaarxiv icon

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

Jul 20, 2022
Xin Lai, Zhuotao Tian, Xiaogang Xu, Yingcong Chen, Shu Liu, Hengshuang Zhao, Liwei Wang, Jiaya Jia

Figure 1 for DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
Figure 2 for DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
Figure 3 for DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
Figure 4 for DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

Unsupervised domain adaptation in semantic segmentation has been raised to alleviate the reliance on expensive pixel-wise annotations. It leverages a labeled source domain dataset as well as unlabeled target domain images to learn a segmentation network. In this paper, we observe two main issues of the existing domain-invariant learning framework. (1) Being distracted by the feature distribution alignment, the network cannot focus on the segmentation task. (2) Fitting source domain data well would compromise the target domain performance. To address these issues, we propose DecoupleNet that alleviates source domain overfitting and enables the final model to focus more on the segmentation task. Furthermore, we put forward Self-Discrimination (SD) and introduce an auxiliary classifier to learn more discriminative target domain features with pseudo labels. Finally, we propose Online Enhanced Self-Training (OEST) to contextually enhance the quality of pseudo labels in an online manner. Experiments show our method outperforms existing state-of-the-art methods, and extensive ablation studies verify the effectiveness of each component. Code is available at https://github.com/dvlab-research/DecoupleNet.

* Accepted to ECCV 2022. Code is available at https://github.com/dvlab-research/DecoupleNet 
Viaarxiv icon