Alert button
Picture for Qin Zou

Qin Zou

Alert button

NePF: Neural Photon Field for Single-Stage Inverse Rendering

Nov 20, 2023
Tuen-Yue Tsui, Qin Zou

We present a novel single-stage framework, Neural Photon Field (NePF), to address the ill-posed inverse rendering from multi-view images. Contrary to previous methods that recover the geometry, material, and illumination in multiple stages and extract the properties from various multi-layer perceptrons across different neural fields, we question such complexities and introduce our method - a single-stage framework that uniformly recovers all properties. NePF achieves this unification by fully utilizing the physical implication behind the weight function of neural implicit surfaces and the view-dependent radiance. Moreover, we introduce an innovative coordinate-based illumination model for rapid volume physically-based rendering. To regularize this illumination, we implement the subsurface scattering model for diffuse estimation. We evaluate our method on both real and synthetic datasets. The results demonstrate the superiority of our approach in recovering high-fidelity geometry and visual-plausible material attributes.

Viaarxiv icon

Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather

Jul 20, 2023
Jinlong Li, Runsheng Xu, Jin Ma, Qin Zou, Jiaqi Ma, Hongkai Yu

Figure 1 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather
Figure 2 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather
Figure 3 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather
Figure 4 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather

Typically, object detection methods for autonomous driving that rely on supervised learning make the assumption of a consistent feature distribution between the training and testing data, however such assumption may fail in different weather conditions. Due to the domain gap, a detection model trained under clear weather may not perform well in foggy and rainy conditions. Overcoming detection bottlenecks in foggy and rainy weather is a real challenge for autonomous vehicles deployed in the wild. To bridge the domain gap and improve the performance of object detectionin foggy and rainy weather, this paper presents a novel framework for domain-adaptive object detection. The adaptations at both the image-level and object-level are intended to minimize the differences in image style and object appearance between domains. Furthermore, in order to improve the model's performance on challenging examples, we introduce a novel adversarial gradient reversal layer that conducts adversarial mining on difficult instances in addition to domain adaptation. Additionally, we suggest generating an auxiliary domain through data augmentation to enforce a new domain-level metric regularization. Experimental findings on public V2V benchmark exhibit a substantial enhancement in object detection specifically for foggy and rainy driving scenarios.

* only change the title of this paper 
Viaarxiv icon

S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Jul 18, 2023
Jinlong Li, Runsheng Xu, Xinyu Liu, Baolu Li, Qin Zou, Jiaqi Ma, Hongkai Yu

Figure 1 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Figure 2 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Figure 3 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Figure 4 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Due to the lack of real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Implementation Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Implementation Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap. Our intensive experiments on the public multi-agent cooperative perception datasets OPV2V and V2V4Real demonstrate that the proposed S2R-ViT can effectively bridge the gap from simulation to reality and outperform other methods significantly for point cloud-based 3D object detection.

* correct the complie error in Fig.5 
Viaarxiv icon

Coarse-to-fine Task-driven Inpainting for Geoscience Images

Dec 06, 2022
Huiming Sun, Jin Ma, Qing Guo, Qin Zou, Shaoyue Song, Yuewei Lin, Hongkai Yu

Figure 1 for Coarse-to-fine Task-driven Inpainting for Geoscience Images
Figure 2 for Coarse-to-fine Task-driven Inpainting for Geoscience Images
Figure 3 for Coarse-to-fine Task-driven Inpainting for Geoscience Images
Figure 4 for Coarse-to-fine Task-driven Inpainting for Geoscience Images

The processing and recognition of geoscience images have wide applications. Most of existing researches focus on understanding the high-quality geoscience images by assuming that all the images are clear. However, in many real-world cases, the geoscience images might contain occlusions during the image acquisition. This problem actually implies the image inpainting problem in computer vision and multimedia. To the best of our knowledge, all the existing image inpainting algorithms learn to repair the occluded regions for a better visualization quality, they are excellent for natural images but not good enough for geoscience images by ignoring the geoscience related tasks. This paper aims to repair the occluded regions for a better geoscience task performance with the advanced visualization quality simultaneously, without changing the current deployed deep learning based geoscience models. Because of the complex context of geoscience images, we propose a coarse-to-fine encoder-decoder network with coarse-to-fine adversarial context discriminators to reconstruct the occluded image regions. Due to the limited data of geoscience images, we use a MaskMix based data augmentation method to exploit more information from limited geoscience image data. The experimental results on three public geoscience datasets for remote sensing scene recognition, cross-view geolocation and semantic segmentation tasks respectively show the effectiveness and accuracy of the proposed method.

Viaarxiv icon

Line Drawing Guided Progressive Inpainting of Mural Damages

Nov 12, 2022
Luxi Li, Qin Zou, Fan Zhang, Hongkai Yu, Long Chen, Chengfang Song, Xianfeng Huang, Xiaoguang Wang

Figure 1 for Line Drawing Guided Progressive Inpainting of Mural Damages
Figure 2 for Line Drawing Guided Progressive Inpainting of Mural Damages
Figure 3 for Line Drawing Guided Progressive Inpainting of Mural Damages
Figure 4 for Line Drawing Guided Progressive Inpainting of Mural Damages

Mural image inpainting refers to repairing the damage or missing areas in a mural image to restore the visual appearance. Most existing image-inpainting methods tend to take a target image as the only input and directly repair the damage to generate a visually plausible result. These methods obtain high performance in restoration or completion of some specific objects, e.g., human face, fabric texture, and printed texts, etc., however, are not suitable for repairing murals with varied subjects, especially for murals with large damaged areas. Moreover, due to the discrete colors in paints, mural inpainting may suffer from apparent color bias as compared to natural image inpainting. To this end, in this paper, we propose a line drawing guided progressive mural inpainting method. It divides the inpainting process into two steps: structure reconstruction and color correction, executed by a structure reconstruction network (SRN) and a color correction network (CCN), respectively. In the structure reconstruction, line drawings are used by SRN as a guarantee for large-scale content authenticity and structural stability. In the color correction, CCN operates a local color adjustment for missing pixels which reduces the negative effects of color bias and edge jumping. The proposed approach is evaluated against the current state-of-the-art image inpainting methods. Qualitative and quantitative results demonstrate the superiority of the proposed method in mural image inpainting. The codes and data are available at {https://github.com/qinnzou/mural-image-inpainting}.

Viaarxiv icon

Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Oct 27, 2022
Jinlong Li, Runsheng Xu, Jin Ma, Qin Zou, Jiaqi Ma, Hongkai Yu

Figure 1 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather
Figure 2 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather
Figure 3 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather
Figure 4 for Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather

Most object detection methods for autonomous driving usually assume a consistent feature distribution between training and testing data, which is not always the case when weathers differ significantly. The object detection model trained under clear weather might not be effective enough in foggy weather because of the domain gap. This paper proposes a novel domain adaptive object detection framework for autonomous driving under foggy weather. Our method leverages both image-level and object-level adaptation to diminish the domain discrepancy in image style and object appearance. To further enhance the model's capabilities under challenging samples, we also come up with a new adversarial gradient reversal layer to perform adversarial mining for the hard examples together with domain adaptation. Moreover, we propose to generate an auxiliary domain by data augmentation to enforce a new domain-level metric regularization. Experimental results on public benchmarks show the effectiveness and accuracy of the proposed method. The code is available at https://github.com/jinlong17/DA-Detect.

* Accepted by WACV2023. Code is available at https://github.com/jinlong17/DA-Detect 
Viaarxiv icon

Luminance Attentive Networks for HDR Image and Panorama Reconstruction

Sep 14, 2021
Hanning Yu, Wentao Liu, Chengjiang Long, Bo Dong, Qin Zou, Chunxia Xiao

Figure 1 for Luminance Attentive Networks for HDR Image and Panorama Reconstruction
Figure 2 for Luminance Attentive Networks for HDR Image and Panorama Reconstruction
Figure 3 for Luminance Attentive Networks for HDR Image and Panorama Reconstruction
Figure 4 for Luminance Attentive Networks for HDR Image and Panorama Reconstruction

It is very challenging to reconstruct a high dynamic range (HDR) from a low dynamic range (LDR) image as an ill-posed problem. This paper proposes a luminance attentive network named LANet for HDR reconstruction from a single LDR image. Our method is based on two fundamental observations: (1) HDR images stored in relative luminance are scale-invariant, which means the HDR images will hold the same information when multiplied by any positive real number. Based on this observation, we propose a novel normalization method called " HDR calibration " for HDR images stored in relative luminance, calibrating HDR images into a similar luminance scale according to the LDR images. (2) The main difference between HDR images and LDR images is in under-/over-exposed areas, especially those highlighted. Following this observation, we propose a luminance attention module with a two-stream structure for LANet to pay more attention to the under-/over-exposed areas. In addition, we propose an extended network called panoLANet for HDR panorama reconstruction from an LDR panorama and build a dualnet structure for panoLANet to solve the distortion problem caused by the equirectangular panorama. Extensive experiments show that our proposed approach LANet can reconstruct visually convincing HDR images and demonstrate its superiority over state-of-the-art approaches in terms of all metrics in inverse tone mapping. The image-based lighting application with our proposed panoLANet also demonstrates that our method can simulate natural scene lighting using only LDR panorama. Our source code is available at https://github.com/LWT3437/LANet.

Viaarxiv icon

Metric Learning for Anti-Compression Facial Forgery Detection

Mar 15, 2021
Shenhao Cao, Qin Zou, Xiuqing Mao, Zhongyuan Wang

Figure 1 for Metric Learning for Anti-Compression Facial Forgery Detection
Figure 2 for Metric Learning for Anti-Compression Facial Forgery Detection
Figure 3 for Metric Learning for Anti-Compression Facial Forgery Detection

Detecting facial forgery images and videos is an increasingly important topic in multimedia forensics. As forgery images and videos are usually compressed to different formats such as JPEG and H264 when circulating on the Internet, existing forgery-detection methods trained on uncompressed data often have significantly decreased performance in identifying them. To solve this problem, we propose a novel anti-compression facial forgery detection framework, which learns a compression-insensitive embedding feature space utilizing both original and compressed forgeries. Specifically, our approach consists of two novel ideas: (i) extracting compression-insensitive features from both uncompressed and compressed forgeries using an adversarial learning strategy; (ii) learning a robust partition by constructing a metric loss that can reduce the distance of the paired original and compressed images in the embedding space. Experimental results demonstrate that, the proposed method is highly effective in handling both compressed and uncompressed facial forgery images.

Viaarxiv icon

Transductive Zero-Shot Hashing for Multi-Label Image Retrieval

Nov 17, 2019
Qin Zou, Zheng Zhang, Ling Cao, Long Chen, Song Wang

Figure 1 for Transductive Zero-Shot Hashing for Multi-Label Image Retrieval
Figure 2 for Transductive Zero-Shot Hashing for Multi-Label Image Retrieval
Figure 3 for Transductive Zero-Shot Hashing for Multi-Label Image Retrieval
Figure 4 for Transductive Zero-Shot Hashing for Multi-Label Image Retrieval

Hash coding has been widely used in approximate nearest neighbor search for large-scale image retrieval. Given semantic annotations such as class labels and pairwise similarities of the training data, hashing methods can learn and generate effective and compact binary codes. While some newly introduced images may contain undefined semantic labels, which we call unseen images, zeor-shot hashing techniques have been studied. However, existing zeor-shot hashing methods focus on the retrieval of single-label images, and cannot handle multi-label images. In this paper, for the first time, a novel transductive zero-shot hashing method is proposed for multi-label unseen image retrieval. In order to predict the labels of the unseen/target data, a visual-semantic bridge is built via instance-concept coherence ranking on the seen/source data. Then, pairwise similarity loss and focal quantization loss are constructed for training a hashing model using both the seen/source and unseen/target data. Extensive evaluations on three popular multi-label datasets demonstrate that, the proposed hashing method achieves significantly better results than the competing methods.

* 12 pages 
Viaarxiv icon

An End-to-End Network for Co-Saliency Detection in One Single Image

Oct 25, 2019
Yuanhao Yue, Qin Zou, Hongkai Yu, Qian Wang, Song Wang

Figure 1 for An End-to-End Network for Co-Saliency Detection in One Single Image
Figure 2 for An End-to-End Network for Co-Saliency Detection in One Single Image
Figure 3 for An End-to-End Network for Co-Saliency Detection in One Single Image
Figure 4 for An End-to-End Network for Co-Saliency Detection in One Single Image

As a common visual problem, co-saliency detection within a single image does not attract enough attention and yet has not been well addressed. Existing methods often follow a bottom-up strategy to infer co-saliency in an image, where salient regions are firstly detected using visual primitives such as color and shape, and then grouped and merged into a co-saliency map. However, co-saliency is intrinsically perceived in a complex manner with bottom-up and top-down strategies combined in human vision. To deal with this problem, a novel end-to-end trainable network is proposed in this paper, which includes a backbone net and two branch nets. The backbone net uses ground-truth masks as top-down guidance for saliency prediction, while the two branch nets construct triplet proposals for feature organization and clustering, which drives the network to be sensitive to co-salient regions in a bottom-up way. To evaluate the proposed method, we construct a new dataset of 2,019 nature images with co-saliency in each image. Experimental results show that the proposed method achieves a state-of-the-art accuracy with a running speed of 28fps.

Viaarxiv icon