Alert button
Picture for Manlin Zhang

Manlin Zhang

Alert button

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Sep 17, 2023
Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan

Figure 1 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 2 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 3 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 4 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.

Viaarxiv icon

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Sep 07, 2023
Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma

Figure 1 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection
Figure 2 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection
Figure 3 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection
Figure 4 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.

* Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine 
Viaarxiv icon

Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning

Dec 08, 2021
Manlin Zhang, Jinpeng Wang, Andy J. Ma

Figure 1 for Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Figure 2 for Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Figure 3 for Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Figure 4 for Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning

Despite the great progress in video understanding made by deep convolutional neural networks, feature representation learned by existing methods may be biased to static visual cues. To address this issue, we propose a novel method to suppress static visual cues (SSVC) based on probabilistic analysis for self-supervised video representation learning. In our method, video frames are first encoded to obtain latent variables under standard normal distribution via normalizing flows. By modelling static factors in a video as a random variable, the conditional distribution of each latent variable becomes shifted and scaled normal. Then, the less-varying latent variables along time are selected as static cues and suppressed to generate motion-preserved videos. Finally, positive pairs are constructed by motion-preserved videos for contrastive learning to alleviate the problem of representation bias to static cues. The less-biased video representation can be better generalized to various downstream tasks. Extensive experiments on publicly available benchmarks demonstrate that the proposed method outperforms the state of the art when only single RGB modality is used for pre-training.

* AAAI2022. v2: Add supplementary 
Viaarxiv icon