Alert button
Picture for Chaochao Lu

Chaochao Lu

Alert button

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Oct 11, 2023
Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao

Figure 1 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
Figure 2 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
Figure 3 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
Figure 4 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Recent works have successfully extended large-scale text-to-image models to the video domain, producing promising results but at a high computational cost and requiring a large amount of video data. In this work, we introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text, by leveraging the power of off-the-shelf text-to-image generation methods (e.g., Stable Diffusion). ConditionVideo generates realistic dynamic videos from random noise or given scene videos. Our method explicitly disentangles the motion representation into condition-guided and scenery motion components. To this end, the ConditionVideo model is designed with a UNet branch and a control branch. To improve temporal coherence, we introduce sparse bi-directional spatial-temporal attention (sBiST-Attn). The 3D control network extends the conventional 2D controlnet model, aiming to strengthen conditional generation accuracy by additionally leveraging the bi-directional frames in the temporal domain. Our method exhibits superior performance in terms of frame consistency, clip score, and conditional accuracy, outperforming other compared methods.

Viaarxiv icon

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Jun 08, 2023
Junda Wu, Tong Yu, Rui Wang, Zhao Song, Ruiyi Zhang, Handong Zhao, Chaochao Lu, Shuai Li, Ricardo Henao

Figure 1 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Figure 2 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Figure 3 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Figure 4 for InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). This novel view helps us to develop a more efficient, accurate and robust soft prompt tuning method InfoPrompt. With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained language model to be more aware of the task-relevant information captured in the learnt prompt. Extensive experiments validate that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. Finally, we provide a formal theoretical result for showing to show that gradient descent type algorithm can be used to train our mutual information loss.

Viaarxiv icon

Action-Sufficient State Representation Learning for Control with Structural Constraints

Oct 12, 2021
Biwei Huang, Chaochao Lu, Liu Leqi, José Miguel Hernández-Lobato, Clark Glymour, Bernhard Schölkopf, Kun Zhang

Figure 1 for Action-Sufficient State Representation Learning for Control with Structural Constraints
Figure 2 for Action-Sufficient State Representation Learning for Control with Structural Constraints
Figure 3 for Action-Sufficient State Representation Learning for Control with Structural Constraints
Figure 4 for Action-Sufficient State Representation Learning for Control with Structural Constraints

Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed \textit{Action-Sufficient state Representations} (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.

Viaarxiv icon

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Jul 07, 2021
Biwei Huang, Fan Feng, Chaochao Lu, Sara Magliacane, Kun Zhang

Figure 1 for AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning
Figure 2 for AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning
Figure 3 for AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning
Figure 4 for AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.

Viaarxiv icon

Nonlinear Invariant Risk Minimization: A Causal Approach

Feb 24, 2021
Chaochao Lu, Yuhuai Wu, Jośe Miguel Hernández-Lobato, Bernhard Schölkopf

Figure 1 for Nonlinear Invariant Risk Minimization: A Causal Approach
Figure 2 for Nonlinear Invariant Risk Minimization: A Causal Approach
Figure 3 for Nonlinear Invariant Risk Minimization: A Causal Approach
Figure 4 for Nonlinear Invariant Risk Minimization: A Causal Approach

Due to spurious correlations, machine learning systems often fail to generalize to environments whose distributions differ from the ones used at training time. Prior work addressing this, either explicitly or implicitly, attempted to find a data representation that has an invariant causal relationship with the target. This is done by leveraging a diverse set of training environments to reduce the effect of spurious features and build an invariant predictor. However, these methods have generalization guarantees only when both data representation and classifiers come from a linear model class. We propose Invariant Causal Representation Learning (ICRL), a learning paradigm that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: the prior over the data representation factorizes when conditioning on the target and the environment. Based on this, we show identifiability of the data representation up to very simple transformations. We also prove that all direct causes of the target can be fully discovered, which further enables us to obtain generalization guarantees in the nonlinear setting. Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods. Finally, in the concluding discussion, we further explore the aforementioned assumption and propose a general view, called the Agnostic Hypothesis: there exist a set of hidden causal factors affecting both inputs and outcomes. The Agnostic Hypothesis can provide a unifying view of machine learning in terms of representation learning. More importantly, it can inspire a new direction to explore the general theory for identifying hidden causal factors, which is key to enabling the OOD generalization guarantees in machine learning.

Viaarxiv icon

Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation

Dec 16, 2020
Chaochao Lu, Biwei Huang, Ke Wang, José Miguel Hernández-Lobato, Kun Zhang, Bernhard Schölkopf

Figure 1 for Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
Figure 2 for Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
Figure 3 for Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
Figure 4 for Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation

Reinforcement learning (RL) algorithms usually require a substantial amount of interaction data and perform well only for specific tasks in a fixed environment. In some scenarios such as healthcare, however, usually only few records are available for each patient, and patients may show different responses to the same treatment, impeding the application of current RL algorithms to learn optimal policies. To address the issues of mechanism heterogeneity and related data scarcity, we propose a data-efficient RL algorithm that exploits structural causal models (SCMs) to model the state dynamics, which are estimated by leveraging both commonalities and differences across subjects. The learned SCM enables us to counterfactually reason what would have happened had another treatment been taken. It helps avoid real (possibly risky) exploration and mitigates the issue that limited experiences lead to biased policies. We propose counterfactual RL algorithms to learn both population-level and individual-level policies. We show that counterfactual outcomes are identifiable under mild conditions and that Q- learning on the counterfactual-based augmented data set converges to the optimal value function. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed approach.

* Neural Information Processing Systems Workshop on Offline Reinforcement Learning 
Viaarxiv icon

Interpreting Spatially Infinite Generative Models

Jul 24, 2020
Chaochao Lu, Richard E. Turner, Yingzhen Li, Nate Kushman

Figure 1 for Interpreting Spatially Infinite Generative Models
Figure 2 for Interpreting Spatially Infinite Generative Models
Figure 3 for Interpreting Spatially Infinite Generative Models
Figure 4 for Interpreting Spatially Infinite Generative Models

Traditional deep generative models of images and other spatial modalities can only generate fixed sized outputs. The generated images have exactly the same resolution as the training images, which is dictated by the number of layers in the underlying neural network. Recent work has shown, however, that feeding spatial noise vectors into a fully convolutional neural network enables both generation of arbitrary resolution output images as well as training on arbitrary resolution training images. While this work has provided impressive empirical results, little theoretical interpretation was provided to explain the underlying generative process. In this paper we provide a firm theoretical interpretation for infinite spatial generation, by drawing connections to spatial stochastic processes. We use the resulting intuition to improve upon existing spatially infinite generative models to enable more efficient training through a model that we call an infinite generative adversarial network, or $\infty$-GAN. Experiments on world map generation, panoramic images and texture synthesis verify the ability of $\infty$-GAN to efficiently generate images of arbitrary size.

* ICML 2020 workshop on Human Interpretability in Machine Learning (WHI 2020) 
Viaarxiv icon

Deconfounding Reinforcement Learning in Observational Settings

Dec 26, 2018
Chaochao Lu, Bernhard Schölkopf, José Miguel Hernández-Lobato

Figure 1 for Deconfounding Reinforcement Learning in Observational Settings
Figure 2 for Deconfounding Reinforcement Learning in Observational Settings
Figure 3 for Deconfounding Reinforcement Learning in Observational Settings
Figure 4 for Deconfounding Reinforcement Learning in Observational Settings

We propose a general formulation for addressing reinforcement learning (RL) problems in settings with observational data. That is, we consider the problem of learning good policies solely from historical data in which unobserved factors (confounders) affect both observed actions and rewards. Our formulation allows us to extend a representative RL algorithm, the Actor-Critic method, to its deconfounding variant, with the methodology for this extension being easily applied to other RL algorithms. In addition to this, we develop a new benchmark for evaluating deconfounding RL algorithms by modifying the OpenAI Gym environments and the MNIST dataset. Using this benchmark, we demonstrate that the proposed algorithms are superior to traditional RL methods in confounded environments with observational data. To the best of our knowledge, this is the first time that confounders are taken into consideration for addressing full RL problems with observational data. Code is available at https://github.com/CausalRL/DRL.

* 30 pages 
Viaarxiv icon

Surpassing Human-Level Face Verification Performance on LFW with GaussianFace

Dec 20, 2014
Chaochao Lu, Xiaoou Tang

Figure 1 for Surpassing Human-Level Face Verification Performance on LFW with GaussianFace

Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically complex face variations. This paper proposes a principled multi-task learning approach based on Discriminative Gaussian Process Latent Variable Model, named GaussianFace, to enrich the diversity of training data. In comparison to existing methods, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification in an unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. Extensive experiments demonstrate the effectiveness of the proposed model in learning from diverse data sources and generalize to unseen domain. Specifically, the accuracy of our algorithm achieves an impressive accuracy rate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark. For the first time, the human-level performance in face verification (97.53%) on LFW is surpassed.

* Appearing in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI-15), Oral Presentation 
Viaarxiv icon