Alert button
Picture for Ping Liu

Ping Liu

Alert button

VehicleGAN: Pair-flexible Pose Guided Image Synthesis for Vehicle Re-identification

Nov 27, 2023
Baolu Li, Ping Liu, Lan Fu, Jinlong Li, Jianwu Fang, Zhigang Xu, Hongkai Yu

Vehicle Re-identification (Re-ID) has been broadly studied in the last decade; however, the different camera view angle leading to confused discrimination in the feature subspace for the vehicles of various poses, is still challenging for the Vehicle Re-ID models in the real world. To promote the Vehicle Re-ID models, this paper proposes to synthesize a large number of vehicle images in the target pose, whose idea is to project the vehicles of diverse poses into the unified target pose so as to enhance feature discrimination. Considering that the paired data of the same vehicles in different traffic surveillance cameras might be not available in the real world, we propose the first Pair-flexible Pose Guided Image Synthesis method for Vehicle Re-ID, named as VehicleGAN in this paper, which works for both supervised and unsupervised settings without the knowledge of geometric 3D models. Because of the feature distribution difference between real and synthetic data, simply training a traditional metric learning based Re-ID model with data-level fusion (i.e., data augmentation) is not satisfactory, therefore we propose a new Joint Metric Learning (JML) via effective feature-level fusion from both real and synthetic data. Intensive experimental results on the public VeRi-776 and VehicleID datasets prove the accuracy and effectiveness of our proposed VehicleGAN and JML.

Viaarxiv icon

CLE Diffusion: Controllable Light Enhancement Diffusion Model

Aug 28, 2023
Yuyang Yin, Dejia Xu, Chuangchuang Tan, Ping Liu, Yao Zhao, Yunchao Wei

Figure 1 for CLE Diffusion: Controllable Light Enhancement Diffusion Model
Figure 2 for CLE Diffusion: Controllable Light Enhancement Diffusion Model
Figure 3 for CLE Diffusion: Controllable Light Enhancement Diffusion Model
Figure 4 for CLE Diffusion: Controllable Light Enhancement Diffusion Model

Low light enhancement has gained increasing importance with the rapid development of visual creation and editing. However, most existing enhancement algorithms are designed to homogeneously increase the brightness of images to a pre-defined extent, limiting the user experience. To address this issue, we propose Controllable Light Enhancement Diffusion Model, dubbed CLE Diffusion, a novel diffusion framework to provide users with rich controllability. Built with a conditional diffusion model, we introduce an illumination embedding to let users control their desired brightness level. Additionally, we incorporate the Segment-Anything Model (SAM) to enable user-friendly region controllability, where users can click on objects to specify the regions they wish to enhance. Extensive experiments demonstrate that CLE Diffusion achieves competitive performance regarding quantitative metrics, qualitative results, and versatile controllability. Project page: https://yuyangyin.github.io/CLEDiffusion/

* Accepted In Proceedings of the 31st ACM International Conference on Multimedia (MM' 23) 
Viaarxiv icon

Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation

Jul 03, 2023
Gabriel Tjio, Ping Liu, Yawei Luo, Chee Keong Kwoh, Joey Zhou Tianyi

Figure 1 for Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation
Figure 2 for Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation
Figure 3 for Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation
Figure 4 for Generating Reliable Pixel-Level Labels for Source Free Domain Adaptation

This work addresses the challenging domain adaptation setting in which knowledge from the labelled source domain dataset is available only from the pretrained black-box segmentation model. The pretrained model's predictions for the target domain images are noisy because of the distributional differences between the source domain data and the target domain data. Since the model's predictions serve as pseudo labels during self-training, the noise in the predictions impose an upper bound on model performance. Therefore, we propose a simple yet novel image translation workflow, ReGEN, to address this problem. ReGEN comprises an image-to-image translation network and a segmentation network. Our workflow generates target-like images using the noisy predictions from the original target domain images. These target-like images are semantically consistent with the noisy model predictions and therefore can be used to train the segmentation network. In addition to being semantically consistent with the predictions from the original target domain images, the generated target-like images are also stylistically similar to the target domain images. This allows us to leverage the stylistic differences between the target-like images and the target domain image as an additional source of supervision while training the segmentation model. We evaluate our model with two benchmark domain adaptation settings and demonstrate that our approach performs favourably relative to recent state-of-the-art work. The source code will be made available.

Viaarxiv icon

Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning

May 24, 2023
Feifei Shao, Yawei Luo, Lei Chen, Ping Liu, Yi Yang, Jun Xiao

Figure 1 for Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning
Figure 2 for Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning
Figure 3 for Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning
Figure 4 for Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning

In this paper, we focus on an under-explored issue of biased activation in prior weakly-supervised object localization methods based on Class Activation Mapping (CAM). We analyze the cause of this problem from a causal view and attribute it to the co-occurring background confounders. Following this insight, we propose a novel Counterfactual Co-occurring Learning (CCL) paradigm to synthesize the counterfactual representations via coupling constant foreground and unrealized backgrounds in order to cut off their co-occurring relationship. Specifically, we design a new network structure called Counterfactual-CAM, which embeds the counterfactual representation perturbation mechanism into the vanilla CAM-based model. This mechanism is responsible for decoupling foreground as well as background and synthesizing the counterfactual representations. By training the detection model with these synthesized representations, we compel the model to focus on the constant foreground content while minimizing the influence of distracting co-occurring background. To our best knowledge, it is the first attempt in this direction. Extensive experiments on several benchmarks demonstrate that Counterfactual-CAM successfully mitigates the biased activation problem, achieving improved object localization accuracy.

* 13 pages, 5 figures, 4 tables 
Viaarxiv icon

Text-guided Eyeglasses Manipulation with Spatial Constraints

Apr 25, 2023
Jiacheng Wang, Ping Liu, Jingen Liu, Wei Xu

Figure 1 for Text-guided Eyeglasses Manipulation with Spatial Constraints
Figure 2 for Text-guided Eyeglasses Manipulation with Spatial Constraints
Figure 3 for Text-guided Eyeglasses Manipulation with Spatial Constraints
Figure 4 for Text-guided Eyeglasses Manipulation with Spatial Constraints

Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on. While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient. To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively. Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions. This design allows for fine-grained control of the eyeglasses' appearance based on both textual descriptions and spatial constraints. Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing. We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses. Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.

* 14 pages, 12 figures 
Viaarxiv icon

UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image Translation

Apr 24, 2023
Yaxin Shi, Xiaowei Zhou, Ping Liu, Ivor W. Tsang

Figure 1 for UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image Translation
Figure 2 for UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image Translation
Figure 3 for UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image Translation
Figure 4 for UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image Translation

In the field of Image-to-Image (I2I) translation, ensuring consistency between input images and their translated results is a key requirement for producing high-quality and desirable outputs. Previous I2I methods have relied on result consistency, which enforces consistency between the translated results and the ground truth output, to achieve this goal. However, result consistency is limited in its ability to handle complex and unseen attribute changes in translation tasks. To address this issue, we introduce a transition-aware approach to I2I translation, where the data translation mapping is explicitly parameterized with a transition variable, allowing for the modelling of unobserved translations triggered by unseen transitions. Furthermore, we propose the use of transition consistency, defined on the transition variable, to enable regularization of consistency on unobserved translations, which is omitted in previous works. Based on these insights, we present Unseen Transition Suss GAN (UTSGAN), a generative framework that constructs a manifold for the transition with a stochastic transition encoder and coherently regularizes and generalizes result consistency and transition consistency on both training and unobserved translations with tailor-designed constraints. Extensive experiments on four different I2I tasks performed on five different datasets demonstrate the efficacy of our proposed UTSGAN in performing consistent translations.

* 17 pages, 17 figures 
Viaarxiv icon

Adaptive Stylization Modulation for Domain Generalized Semantic Segmentation

Apr 20, 2023
Gabriel Tjio, Ping Liu, Chee-Keong Kwoh, Joey Tianyi Zhou

Figure 1 for Adaptive Stylization Modulation for Domain Generalized Semantic Segmentation
Figure 2 for Adaptive Stylization Modulation for Domain Generalized Semantic Segmentation
Figure 3 for Adaptive Stylization Modulation for Domain Generalized Semantic Segmentation
Figure 4 for Adaptive Stylization Modulation for Domain Generalized Semantic Segmentation

Obtaining sufficient labelled data for model training is impractical for most real-life applications. Therefore, we address the problem of domain generalization for semantic segmentation tasks to reduce the need to acquire and label additional data. Recent work on domain generalization increase data diversity by varying domain-variant features such as colour, style and texture in images. However, excessive stylization or even uniform stylization may reduce performance. Performance reduction is especially pronounced for pixels from minority classes, which are already more challenging to classify compared to pixels from majority classes. Therefore, we introduce a module, $ASH_{+}$, that modulates stylization strength for each pixel depending on the pixel's semantic content. In this work, we also introduce a parameter that balances the element-wise and channel-wise proportion of stylized features with the original source domain features in the stylized source domain images. This learned parameter replaces an empirically determined global hyperparameter, allowing for more fine-grained control over the output stylized image. We conduct multiple experiments to validate the effectiveness of our proposed method. Finally, we evaluate our model on the publicly available benchmark semantic segmentation datasets (Cityscapes and SYNTHIA). Quantitative and qualitative comparisons indicate that our approach is competitive with state-of-the-art. Code is made available at \url{https://github.com/placeholder}

Viaarxiv icon

An Operator Theory for Analyzing the Resolution of Multi-illumination Imaging Modalities

Feb 02, 2023
Ping Liu, Habib Ammari

Figure 1 for An Operator Theory for Analyzing the Resolution of Multi-illumination Imaging Modalities
Figure 2 for An Operator Theory for Analyzing the Resolution of Multi-illumination Imaging Modalities

By introducing a new operator theory, we provide a unified mathematical theory for general source resolution in the multi-illumination imaging problem. Our main idea is to transform multi-illumination imaging into single-snapshot imaging with a new imaging kernel that depends on both the illumination patterns and the point spread function of the imaging system. We thus prove that the resolution of multi-illumination imaging is approximately determined by the essential cutoff frequency of the new imaging kernel, which is roughly limited by the sum of the cutoff frequency of the point spread function and the maximum essential frequency in the illumination patterns. Our theory provides a unified way to estimate the resolution of various existing super-resolution modalities and results in the same estimates as those obtained in experiments. In addition, based on the reformulation of the multi-illumination imaging problem, we also estimate the resolution limits for resolving both complex and positive sources by sparsity-based approaches. We show that the resolution of multi-illumination imaging is approximately determined by the new imaging kernel from our operator theory and better resolution can be realized by sparsity-promoting techniques in practice but only for resolving very sparse sources. This explains experimentally observed phenomena in some sparsity-based super-resolution modalities.

Viaarxiv icon

Super-resolution of positive near-colliding point sources

Dec 01, 2022
Ping Liu, Habib Ammari

Figure 1 for Super-resolution of positive near-colliding point sources

In this paper, we analyze the capacity of super-resolution of one-dimensional positive sources. In particular, we consider the same setting as in [arXiv:1904.09186v2 [math.NA]] and generalize the results there to the case of super-resolving positive sources. To be more specific, we consider resolving $d$ positive point sources with $p \leqslant d$ nodes closely spaced and forming a cluster, while the rest of the nodes are well separated. Similarly to [arXiv:1904.09186v2 [math.NA]], our results show that when the noise level $\epsilon \lesssim \mathrm{SRF}^{-2 p+1}$, where $\mathrm{SRF}=(\Omega \Delta)^{-1}$ with $\Omega$ being the cutoff frequency and $\Delta$ the minimal separation between the nodes, the minimax error rate for reconstructing the cluster nodes is of order $\frac{1}{\Omega} \mathrm{SRF}^{2 p-2} \epsilon$, while for recovering the corresponding amplitudes $\left\{a_j\right\}$ the rate is of order $\mathrm{SRF}^{2 p-1} \epsilon$. For the non-cluster nodes, the corresponding minimax rates for the recovery of nodes and amplitudes are of order $\frac{\epsilon}{\Omega}$ and $\epsilon$, respectively. Our numerical experiments show that the Matrix Pencil method achieves the above optimal bounds when resolving the positive sources.

Viaarxiv icon