Alert button
Picture for Zhongxuan Luo

Zhongxuan Luo

Alert button

Learning Heavily-Degraded Prior for Underwater Object Detection

Aug 24, 2023
Chenping Fu, Xin Fan, Jiewen Xiao, Wanqi Yuan, Risheng Liu, Zhongxuan Luo

Figure 1 for Learning Heavily-Degraded Prior for Underwater Object Detection
Figure 2 for Learning Heavily-Degraded Prior for Underwater Object Detection
Figure 3 for Learning Heavily-Degraded Prior for Underwater Object Detection
Figure 4 for Learning Heavily-Degraded Prior for Underwater Object Detection

Underwater object detection suffers from low detection performance because the distance and wavelength dependent imaging process yield evident image quality degradations such as haze-like effects, low visibility, and color distortions. Therefore, we commit to resolving the issue of underwater object detection with compounded environmental degradations. Typical approaches attempt to develop sophisticated deep architecture to generate high-quality images or features. However, these methods are only work for limited ranges because imaging factors are either unstable, too sensitive, or compounded. Unlike these approaches catering for high-quality images or features, this paper seeks transferable prior knowledge from detector-friendly images. The prior guides detectors removing degradations that interfere with detection. It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps while the lightly degraded regions of them overlap each other. Therefore, we propose a residual feature transference module (RFTM) to learn a mapping between deep representations of the heavily degraded patches of DFUI- and underwater- images, and make the mapping as a heavily degraded prior (HDP) for underwater detection. Since the statistical properties are independent to image content, HDP can be learned without the supervision of semantic labels and plugged into popular CNNbased feature extraction networks to improve their performance on underwater object detection. Without bells and whistles, evaluations on URPC2020 and UODD show that our methods outperform CNN-based detectors by a large margin. Our method with higher speeds and less parameters still performs better than transformer-based detectors. Our code and DFUI dataset can be found in https://github.com/xiaoDetection/Learning-Heavily-Degraed-Prior.

Viaarxiv icon

Bilevel Generative Learning for Low-Light Vision

Aug 07, 2023
Yingchi Liu, Zhu Liu, Long Ma, Jinyuan Liu, Xin Fan, Zhongxuan Luo, Risheng Liu

Figure 1 for Bilevel Generative Learning for Low-Light Vision
Figure 2 for Bilevel Generative Learning for Low-Light Vision
Figure 3 for Bilevel Generative Learning for Low-Light Vision
Figure 4 for Bilevel Generative Learning for Low-Light Vision

Recently, there has been a growing interest in constructing deep learning schemes for Low-Light Vision (LLV). Existing techniques primarily focus on designing task-specific and data-dependent vision models on the standard RGB domain, which inherently contain latent data associations. In this study, we propose a generic low-light vision solution by introducing a generative block to convert data from the RAW to the RGB domain. This novel approach connects diverse vision problems by explicitly depicting data generation, which is the first in the field. To precisely characterize the latent correspondence between the generative procedure and the vision task, we establish a bilevel model with the parameters of the generative block defined as the upper level and the parameters of the vision task defined as the lower level. We further develop two types of learning strategies targeting different goals, namely low cost and high accuracy, to acquire a new bilevel generative learning paradigm. The generative blocks embrace a strong generalization ability in other low-light vision tasks through the bilevel optimization on enhancement tasks. Extensive experimental evaluations on three representative low-light vision tasks, namely enhancement, detection, and segmentation, fully demonstrate the superiority of our proposed approach. The code will be available at https://github.com/Yingchi1998/BGL.

* Accepted by ACM MM'2023, The code will be available at https://github.com/Yingchi1998/BGL 
Viaarxiv icon

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

Aug 04, 2023
Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, Xin Fan

Figure 1 for Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation
Figure 2 for Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation
Figure 3 for Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation
Figure 4 for Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation. Early efforts focus on boosting the performance for only one task, \emph{e.g.,} fusion or segmentation, making it hard to reach~`Best of Both Worlds'. To overcome this issue, in this paper, we propose a \textbf{M}ulti-\textbf{i}nteractive \textbf{F}eature learning architecture for image fusion and \textbf{Seg}mentation, namely SegMiF, and exploit dual-task correlation to promote the performance of both tasks. The SegMiF is of a cascade structure, containing a fusion sub-network and a commonly used segmentation sub-network. By slickly bridging intermediate features between two components, the knowledge learned from the segmentation task can effectively assist the fusion task. Also, the benefited fusion network supports the segmentation one to perform more pretentiously. Besides, a hierarchical interactive attention block is established to ensure fine-grained mapping of all the vital information between two tasks, so that the modality/semantic features can be fully mutual-interactive. In addition, a dynamic weight factor is introduced to automatically adjust the corresponding weights of each task, which can balance the interactive feature correspondence and break through the limitation of laborious tuning. Furthermore, we construct a smart multi-wave binocular imaging system and collect a full-time multi-modality benchmark with 15 annotated pixel-level categories for image fusion and segmentation. Extensive experiments on several public datasets and our benchmark demonstrate that the proposed method outputs visually appealing fused images and perform averagely $7.66\%$ higher segmentation mIoU in the real-world scene than the state-of-the-art approaches. The source code and benchmark are available at \url{https://github.com/JinyuanLiu-CV/SegMiF}.

* Accepted by ICCV 2023. The source code and benchmark are available at https://github.com/JinyuanLiu-CV/SegMiF 
Viaarxiv icon

DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Jul 21, 2023
Zezeng Li, ShengHao Li, Zhanpeng Wang, Na Lei, Zhongxuan Luo, Xianfeng Gu

Figure 1 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
Figure 2 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
Figure 3 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
Figure 4 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT

* iccv2023 accepted 
Viaarxiv icon

OT-Net: A Reusable Neural Optimal Transport Solver

Jun 14, 2023
Zezeng Li, Shenghao Li, Lianbao Jin, Na Lei, Zhongxuan Luo

Figure 1 for OT-Net: A Reusable Neural Optimal Transport Solver
Figure 2 for OT-Net: A Reusable Neural Optimal Transport Solver
Figure 3 for OT-Net: A Reusable Neural Optimal Transport Solver
Figure 4 for OT-Net: A Reusable Neural Optimal Transport Solver

With the widespread application of optimal transport (OT), its calculation becomes essential, and various algorithms have emerged. However, the existing methods either have low efficiency or cannot represent discontinuous maps. A novel reusable neural OT solver OT-Net is thus presented, which first learns Brenier's height representation via the neural network to obtain its potential, and then gained the OT map by computing the gradient of the potential. The algorithm has two merits, 1) it can easily represent discontinuous maps, which allows it to match any target distribution with discontinuous supports and achieve sharp boundaries. This can well eliminate mode collapse in the generated models. 2) The OT map can be calculated straightly by the proposed algorithm when new target samples are added, which greatly improves the efficiency and reusability of the map. Moreover, the theoretical error bound of the algorithm is analyzed, and we have demonstrated the empirical success of our approach in image generation, color transfer, and domain adaptation.

Viaarxiv icon

A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion

May 25, 2023
Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan, Zhongxuan Luo

Figure 1 for A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion
Figure 2 for A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion
Figure 3 for A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion
Figure 4 for A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion

Image fusion plays a key role in a variety of multi-sensor-based vision systems, especially for enhancing visual quality and/or extracting aggregated features for perception. However, most existing methods just consider image fusion as an individual task, thus ignoring its underlying relationship with these downstream vision problems. Furthermore, designing proper fusion architectures often requires huge engineering labor. It also lacks mechanisms to improve the flexibility and generalization ability of current fusion approaches. To mitigate these issues, we establish a Task-guided, Implicit-searched and Meta-initialized (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we first propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency. In addition, a pretext meta initialization technique is introduced to leverage divergence fusion data to support fast adaptation for different kinds of image fusion tasks. Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e.g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM. The source code will be available at https://github.com/LiuZhu-CV/TIMFusion.

* 16 pages, 12 figures, Codes are available at https://github.com/LiuZhu-CV/TIMFusion 
Viaarxiv icon

Practical Exposure Correction: Great Truths Are Always Simple

Dec 29, 2022
Long Ma, Tianjiao Ma, Xinwei Xue, Xin Fan, Zhongxuan Luo, Risheng Liu

Figure 1 for Practical Exposure Correction: Great Truths Are Always Simple
Figure 2 for Practical Exposure Correction: Great Truths Are Always Simple
Figure 3 for Practical Exposure Correction: Great Truths Are Always Simple
Figure 4 for Practical Exposure Correction: Great Truths Are Always Simple

Improving the visual quality of the given degraded observation by correcting exposure level is a fundamental task in the computer vision community. Existing works commonly lack adaptability towards unknown scenes because of the data-driven patterns (deep networks) and limited regularization (traditional optimization), and they usually need time-consuming inference. These two points heavily limit their practicability. In this paper, we establish a Practical Exposure Corrector (PEC) that assembles the characteristics of efficiency and performance. To be concrete, we rethink the exposure correction to provide a linear solution with exposure-sensitive compensation. Around generating the compensation, we introduce an exposure adversarial function as the key engine to fully extract valuable information from the observation. By applying the defined function, we construct a segmented shrinkage iterative scheme to generate the desired compensation. Its shrinkage nature supplies powerful support for algorithmic stability and robustness. Extensive experimental evaluations fully reveal the superiority of our proposed PEC. The code is available at https://rsliu.tech/PEC.

* Under Review. Project Page: https://rsliu.tech/PEC 
Viaarxiv icon

CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion

Nov 20, 2022
Jinyuan Liu, Runjia Lin, Guanyao Wu, Risheng Liu, Zhongxuan Luo, Xin Fan

Figure 1 for CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion
Figure 2 for CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion
Figure 3 for CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion
Figure 4 for CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion

Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors. Existing learning-based fusion approaches attempt to construct various loss functions to preserve complementary features from both modalities, while neglecting to discover the inter-relationship between the two modalities, leading to redundant or even invalid information on the fusion results. To alleviate these issues, we propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion in an end-to-end manner. Concretely, to simultaneously retain typical features from both modalities and remove unwanted information emerging on the fused result, we develop a coupled contrastive constraint in our loss function.In a fused imge, its foreground target/background detail part is pulled close to the infrared/visible source and pushed far away from the visible/infrared source in the representation space. We further exploit image characteristics to provide data-sensitive weights, which allows our loss function to build a more reliable relationship with source images. Furthermore, to learn rich hierarchical feature representation and comprehensively transfer features in the fusion process, a multi-level attention module is established. In addition, we also apply the proposed CoCoNet on medical image fusion of different types, e.g., magnetic resonance image and positron emission tomography image, magnetic resonance image and single photon emission computed tomography image. Extensive experiments demonstrate that our method achieves the state-of-the-art (SOTA) performance under both subjective and objective evaluation, especially in preserving prominent targets and recovering vital textural details.

* 25 pages, 16 figures 
Viaarxiv icon

Toward Fast, Flexible, and Robust Low-Light Image Enhancement

Apr 21, 2022
Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, Zhongxuan Luo

Figure 1 for Toward Fast, Flexible, and Robust Low-Light Image Enhancement
Figure 2 for Toward Fast, Flexible, and Robust Low-Light Image Enhancement
Figure 3 for Toward Fast, Flexible, and Robust Low-Light Image Enhancement
Figure 4 for Toward Fast, Flexible, and Robust Low-Light Image Enhancement

Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios. In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. To be specific, we establish a cascaded illumination learning process with weight sharing to handle this task. Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage, producing the gains that only use the single basic block for inference (yet has not been exploited in previous works), which drastically diminishes computation cost. We then define the unsupervised training loss to elevate the model capability that can adapt to general scenes. Further, we make comprehensive explorations to excavate SCI's inherent properties (lacking in existing works) including operation-insensitive adaptability (acquiring stable performance under the settings of different simple operations) and model-irrelevant generality (can be applied to illumination-based existing works to improve performance). Finally, plenty of experiments and ablation studies fully indicate our superiority in both quality and efficiency. Applications on low-light face detection and nighttime semantic segmentation fully reveal the latent practical values for SCI. The source code is available at https://github.com/vis-opt-group/SCI.

* CVPR 2022 (Oral). Project page: https://github.com/vis-opt-group/SCI 
Viaarxiv icon

Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection

Mar 30, 2022
Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, Zhongxuan Luo

Figure 1 for Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
Figure 2 for Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
Figure 3 for Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
Figure 4 for Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection

This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Aiming at generating an image of high visual quality, previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. These approaches neglect that modality differences implying the complementary information are extremely important for both fusion and subsequent detection task. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. The fusion network with one generator and dual discriminators seeks commons while learning from differences, which preserves structural information of targets from the infrared and textural details from the visible. Furthermore, we build a synchronized imaging system with calibrated infrared and optical sensors, and collect currently the most comprehensive benchmark covering a wide range of scenarios. Extensive experiments on several public datasets and our benchmark demonstrate that our method outputs not only visually appealing fusion but also higher detection mAP than the state-of-the-art approaches.

* Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (Oral) 
Viaarxiv icon