Alert button
Picture for Jian Yuan

Jian Yuan

Alert button

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

Jun 25, 2023
Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.

* extended from DDPM-PA (arXiv:2211.03264), 33 pages, 34 figures 
Viaarxiv icon

Few-shot 3D Shape Generation

May 19, 2023
Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Realistic and diverse 3D shape generation is helpful for a wide variety of applications such as virtual reality, gaming, and animation. Modern generative models, such as GANs and diffusion models, learn from large-scale datasets and generate new samples following similar data distributions. However, when training data is limited, deep neural generative networks overfit and tend to replicate training samples. Prior works focus on few-shot image generation to produce high-quality and diverse results using a few target images. Unfortunately, abundant 3D shape data is typically hard to obtain as well. In this work, we make the first attempt to realize few-shot 3D shape generation by adapting generative models pre-trained on large source domains to target domains using limited data. To relieve overfitting and keep considerable diversity, we propose to maintain the probability distributions of the pairwise relative distances between adapted samples at feature-level and shape-level during domain adaptation. Our approach only needs the silhouettes of few-shot target samples as training data to learn target geometry distributions and achieve generated shapes with diverse topology and textures. Moreover, we introduce several metrics to evaluate the quality and diversity of few-shot 3D shape generation. The effectiveness of our approach is demonstrated qualitatively and quantitatively under a series of few-shot 3D shape adaptation setups.

* 23 pages, 15 figures 
Viaarxiv icon

Prediction with Incomplete Data under Agnostic Mask Distribution Shift

May 18, 2023
Yichen Zhu, Jian Yuan, Bo Jiang, Tao Lin, Haiming Jin, Xinbing Wang, Chenghu Zhou

Figure 1 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Figure 2 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Figure 3 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Figure 4 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift

Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i.e., mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.

Viaarxiv icon

MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs

Mar 06, 2023
Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Figure 1 for MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
Figure 2 for MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
Figure 3 for MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs
Figure 4 for MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs

Video generation has achieved rapid progress benefiting from high-quality renderings provided by powerful image generators. We regard the video synthesis task as generating a sequence of images sharing the same contents but varying in motions. However, most previous video synthesis frameworks based on pre-trained image generators treat content and motion generation separately, leading to unrealistic generated videos. Therefore, we design a novel framework to build the motion space, aiming to achieve content consistency and fast convergence for video generation. We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators. Firstly, we propose an image pair generator named MotionStyleGAN to generate image pairs sharing the same contents and producing various motions. Then we manage to acquire motion codes to edit one image in the generated image pairs and keep the other unchanged. The motion codes help us edit images within the motion space since the edited image shares the same contents with the other unchanged one in image pairs. Finally, we introduce a latent code generator to produce latent code sequences using motion codes for video generation. Our approach achieves state-of-the-art performance on the most complex video dataset ever used for unconditional video generation evaluation, UCF101.

* Accepted by IEEE Transactions on Multimedia as a regular paper 
Viaarxiv icon

Few-shot Image Generation with Diffusion Models

Nov 11, 2022
Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Figure 1 for Few-shot Image Generation with Diffusion Models
Figure 2 for Few-shot Image Generation with Diffusion Models
Figure 3 for Few-shot Image Generation with Diffusion Models
Figure 4 for Few-shot Image Generation with Diffusion Models

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. However, to our knowledge, few-shot image generation tasks have yet to be studied with DDPM-based approaches. Modern approaches are mainly built on Generative Adversarial Networks (GANs) and adapt models pre-trained on large source domains to target domains using a few available samples. In this paper, we make the first attempt to study when do DDPMs overfit and suffer severe diversity degradation as training data become scarce. Then we fine-tune DDPMs pre-trained on large source domains on limited target data directly. Our results show that utilizing knowledge from pre-trained models can accelerate convergence and improve generation quality and diversity compared with training from scratch. However, the fine-tuned models still fail to retain some diverse features and can only achieve limited diversity. Therefore, we propose a pairwise DDPM adaptation (DDPM-PA) approach based on a pairwise similarity loss to preserve the relative distances between generated samples during domain adaptation. DDPM-PA further improves generation diversity and achieves results better than current state-of-the-art GAN-based approaches. We demonstrate the effectiveness of DDPM-PA on a series of few-shot image generation tasks qualitatively and quantitatively.

Viaarxiv icon

Few-shot Image Generation via Masked Discrimination

Oct 27, 2022
Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Figure 1 for Few-shot Image Generation via Masked Discrimination
Figure 2 for Few-shot Image Generation via Masked Discrimination
Figure 3 for Few-shot Image Generation via Masked Discrimination
Figure 4 for Few-shot Image Generation via Masked Discrimination

Few-shot image generation aims to generate images of high quality and great diversity with limited data. However, it is difficult for modern GANs to avoid overfitting when trained on only a few images. The discriminator can easily remember all the training samples and guide the generator to replicate them, leading to severe diversity degradation. Several methods have been proposed to relieve overfitting by adapting GANs pre-trained on large source domains to target domains with limited real samples. In this work, we present a novel approach to realize few-shot GAN adaptation via masked discrimination. Random masks are applied to features extracted by the discriminator from input images. We aim to encourage the discriminator to judge more diverse images which share partially common features with training samples as realistic images. Correspondingly, the generator is guided to generate more diverse images instead of replicating training samples. In addition, we employ cross-domain consistency loss for the discriminator to keep relative distances between samples in its feature space. The discriminator cross-domain consistency loss serves as another optimization target in addition to adversarial loss and guides adapted GANs to preserve more information learned from source domains for higher image quality. The effectiveness of our approach is demonstrated both qualitatively and quantitatively with higher quality and greater diversity on a series of few-shot image generation tasks than prior methods.

Viaarxiv icon

Learning to Advise and Learning from Advice in Cooperative Multi-Agent Reinforcement Learning

May 23, 2022
Yue Jin, Shuangqing Wei, Jian Yuan, Xudong Zhang

Figure 1 for Learning to Advise and Learning from Advice in Cooperative Multi-Agent Reinforcement Learning
Figure 2 for Learning to Advise and Learning from Advice in Cooperative Multi-Agent Reinforcement Learning
Figure 3 for Learning to Advise and Learning from Advice in Cooperative Multi-Agent Reinforcement Learning
Figure 4 for Learning to Advise and Learning from Advice in Cooperative Multi-Agent Reinforcement Learning

Learning to coordinate is a daunting problem in multi-agent reinforcement learning (MARL). Previous works have explored it from many facets, including cognition between agents, credit assignment, communication, expert demonstration, etc. However, less attention were paid to agents' decision structure and the hierarchy of coordination. In this paper, we explore the spatiotemporal structure of agents' decisions and consider the hierarchy of coordination from the perspective of multilevel emergence dynamics, based on which a novel approach, Learning to Advise and Learning from Advice (LALA), is proposed to improve MARL. Specifically, by distinguishing the hierarchy of coordination, we propose to enhance decision coordination at meso level with an advisor and leverage a policy discriminator to advise agents' learning at micro level. The advisor learns to aggregate decision information in both spatial and temporal domains and generates coordinated decisions by employing a spatiotemporal dual graph convolutional neural network with a task-oriented objective function. Each agent learns from the advice via a policy generative adversarial learning method where a discriminator distinguishes between the policies of the agent and the advisor and boosts both of them based on its judgement. Experimental results indicate the advantage of LALA over baseline approaches in terms of both learning efficiency and coordination capability. Coordination mechanism is investigated from the perspective of multilevel emergence dynamics and mutual information point of view, which provides a novel perspective and method to analyze and improve MARL algorithms.

Viaarxiv icon

Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning

Sep 29, 2021
Yue Jin, Shuangqing Wei, Jian Yuan, Xudong Zhang

Figure 1 for Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning
Figure 2 for Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning
Figure 3 for Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning
Figure 4 for Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning

In multi-agent deep reinforcement learning, extracting sufficient and compact information of other agents is critical to attain efficient convergence and scalability of an algorithm. In canonical frameworks, distilling of such information is often done in an implicit and uninterpretable manner, or explicitly with cost functions not able to reflect the relationship between information compression and utility in representation. In this paper, we present Information-Bottleneck-based Other agents' behavior Representation learning for Multi-agent reinforcement learning (IBORM) to explicitly seek low-dimensional mapping encoder through which a compact and informative representation relevant to other agents' behaviors is established. IBORM leverages the information bottleneck principle to compress observation information, while retaining sufficient information relevant to other agents' behaviors used for cooperation decision. Empirical results have demonstrated that IBORM delivers the fastest convergence rate and the best performance of the learned policies, as compared with implicit behavior representation learning and explicit behavior representation learning without explicitly considering information compression and utility.

Viaarxiv icon

Supervised Off-Policy Ranking

Jul 03, 2021
Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu

Figure 1 for Supervised Off-Policy Ranking
Figure 2 for Supervised Off-Policy Ranking
Figure 3 for Supervised Off-Policy Ranking
Figure 4 for Supervised Off-Policy Ranking

Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy. Previous OPE methods mainly focus on precisely estimating the true performance of a policy. We observe that in many applications, (1) the end goal of OPE is to compare two or multiple candidate policies and choose a good one, which is actually a much simpler task than evaluating their true performance; and (2) there are usually multiple policies that have been deployed in real-world systems and thus whose true performance is known through serving real users. Inspired by the two observations, in this work, we define a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of new/target policies based on supervised learning by leveraging off-policy data and policies with known performance. We further propose a method for supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance rather than estimating their precise performance. Our method leverages logged states and policies to learn a Transformer based model that maps offline interaction data including logged states and the actions taken by a target policy on these states to a score. Experiments on different games, datasets, training policy sets, and test policy sets show that our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies. Furthermore, our method is more stable than baseline methods.

Viaarxiv icon