Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Zhu

Tsinghua University

Efficient Backpropagation with Variance-Controlled Adaptive Sampling

Feb 27, 2024

Ziteng Wang, Jianfei Chen, Jun Zhu

Abstract:Sampling-based algorithms, which eliminate ''unimportant'' computations during forward and/or back propagation (BP), offer potential solutions to accelerate neural network training. However, since sampling introduces approximations to training, such algorithms may not consistently maintain accuracy across various tasks. In this work, we introduce a variance-controlled adaptive sampling (VCAS) method designed to accelerate BP. VCAS computes an unbiased stochastic gradient with fine-grained layerwise importance sampling in data dimension for activation gradient calculation and leverage score sampling in token dimension for weight gradient calculation. To preserve accuracy, we control the additional variance by learning the sample ratio jointly with model parameters during training. We assessed VCAS on multiple fine-tuning and pre-training tasks in both vision and natural language domains. On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73.87% FLOPs reduction of BP and 49.58% FLOPs reduction of the whole training process. The implementation is available at https://github.com/thu-ml/VCAS .

* ICLR 2024

Via

Access Paper or Ask Questions

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Feb 26, 2024

Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu

Figure 1 for C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Figure 2 for C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Figure 3 for C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Figure 4 for C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Abstract:Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC.

Via

Access Paper or Ask Questions

CodeS: Towards Building Open-source Language Models for Text-to-SQL

Feb 26, 2024

Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

Figure 1 for CodeS: Towards Building Open-source Language Models for Text-to-SQL

Figure 2 for CodeS: Towards Building Open-source Language Models for Text-to-SQL

Figure 3 for CodeS: Towards Building Open-source Language Models for Text-to-SQL

Figure 4 for CodeS: Towards Building Open-source Language Models for Text-to-SQL

Abstract:Language models have shown promising performance on the task of translating natural language questions into SQL queries (Text-to-SQL). However, most of the state-of-the-art (SOTA) approaches rely on powerful yet closed-source large language models (LLMs), such as ChatGPT and GPT-4, which may have the limitations of unclear model architectures, data privacy risks, and expensive inference overheads. To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task. CodeS is a fully open-source language model, which achieves superior accuracy with much smaller parameter sizes. This paper studies the research challenges in building CodeS. To enhance the SQL generation abilities of CodeS, we adopt an incremental pre-training approach using a specifically curated SQL-centric corpus. Based on this, we address the challenges of schema linking and rapid domain adaptation through strategic prompt construction and a bi-directional data augmentation technique. We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark, the newly released BIRD benchmark, robustness-diagnostic benchmarks such as Spider-DK, Spider-Syn, Spider-Realistic, and Dr.Spider, as well as two real-world datasets created for financial and academic applications. The experimental results show that our CodeS achieves new SOTA accuracy and robustness on nearly all challenging text-to-SQL benchmarks.

* Accepted to SIGMOD 2024

Via

Access Paper or Ask Questions

BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators

Feb 23, 2024

Yu Tian, Xiao Yang, Yinpeng Dong, Heming Yang, Hang Su, Jun Zhu

Abstract:Extremely large image generators offer significant transformative potential across diverse sectors. It allows users to design specific prompts to generate realistic images through some black-box APIs. However, some studies reveal that image generators are notably susceptible to attacks and generate Not Suitable For Work (NSFW) contents by manually designed toxin texts, especially imperceptible to human observers. We urgently need a multitude of universal and transferable prompts to improve the safety of image generators, especially black-box-released APIs. Nevertheless, they are constrained by labor-intensive design processes and heavily reliant on the quality of the given instructions. To achieve this, we introduce a black-box stealthy prompt attack (BSPA) that adopts a retriever to simulate attacks from API users. It can effectively harness filter scores to tune the retrieval space of sensitive words for matching the input prompts, thereby crafting stealthy prompts tailored for image generators. Significantly, this approach is model-agnostic and requires no internal access to the model's features, ensuring its applicability to a wide range of image generators. Building on BSPA, we have constructed an automated prompt tool and a comprehensive prompt attack dataset (NSFWeval). Extensive experiments demonstrate that BSPA effectively explores the security vulnerabilities in a variety of state-of-the-art available black-box models, including Stable Diffusion XL, Midjourney, and DALL-E 2/3. Furthermore, we develop a resilient text filter and offer targeted recommendations to ensure the security of image generators against prompt attacks in the future.

Via

Access Paper or Ask Questions

Your Diffusion Model is Secretly a Certifiably Robust Classifier

Feb 13, 2024

Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, Jun Zhu

Figure 1 for Your Diffusion Model is Secretly a Certifiably Robust Classifier

Figure 2 for Your Diffusion Model is Secretly a Certifiably Robust Classifier

Figure 3 for Your Diffusion Model is Secretly a Certifiably Robust Classifier

Figure 4 for Your Diffusion Model is Secretly a Certifiably Robust Classifier

Abstract:Diffusion models are recently employed as generative classifiers for robust classification. However, a comprehensive theoretical understanding of the robustness of diffusion classifiers is still lacking, leading us to question whether they will be vulnerable to future stronger attacks. In this study, we propose a new family of diffusion classifiers, named Noised Diffusion Classifiers~(NDCs), that possess state-of-the-art certified robustness. Specifically, we generalize the diffusion classifiers to classify Gaussian-corrupted data by deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. We integrate these generalized diffusion classifiers with randomized smoothing to construct smoothed classifiers possessing non-constant Lipschitzness. Experimental results demonstrate the superior certified robustness of our proposed NDCs. Notably, we are the first to achieve 80\%+ and 70\%+ certified robustness on CIFAR-10 under adversarial perturbations with $\ell_2$ norm less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.

Via

Access Paper or Ask Questions

Noise Contrastive Alignment of Language Models with Explicit Rewards

Feb 08, 2024

Huayu Chen, Guande He, Hang Su, Jun Zhu

Figure 1 for Noise Contrastive Alignment of Language Models with Explicit Rewards

Figure 2 for Noise Contrastive Alignment of Language Models with Explicit Rewards

Figure 3 for Noise Contrastive Alignment of Language Models with Explicit Rewards

Figure 4 for Noise Contrastive Alignment of Language Models with Explicit Rewards

Abstract:User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs). Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for pairwise preference data where rewards are implicitly defined rather than explicitly given. In this paper, we introduce a general framework for LM alignment, leveraging Noise Contrastive Estimation (NCE) to bridge the gap in handling reward datasets explicitly annotated with scalar evaluations. Our framework comprises two parallel algorithms, NCA and InfoNCA, both enabling the direct extraction of an LM policy from reward data as well as preference data. Notably, we show that the DPO loss is a special case of our proposed InfoNCA objective under pairwise preference settings, thereby integrating and extending current alignment theories. By contrasting NCA and InfoNCA, we show that InfoNCA and DPO adjust relative likelihood across different responses to a single instruction, while NCA optimizes absolute likelihood for each response. We apply our methods to align a 7B language model with a GPT-4 annotated reward dataset. Experimental results suggest that InfoNCA surpasses the DPO baseline in GPT-4 evaluations, while NCA enjoys better training stability with competitive performance.

Via

Access Paper or Ask Questions

Towards Efficient and Exact Optimization of Language Model Alignment

Feb 02, 2024

Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang

Figure 1 for Towards Efficient and Exact Optimization of Language Model Alignment

Figure 2 for Towards Efficient and Exact Optimization of Language Model Alignment

Figure 3 for Towards Efficient and Exact Optimization of Language Model Alignment

Figure 4 for Towards Efficient and Exact Optimization of Language Model Alignment

Abstract:The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with minimal deviation from the initial policy. While considered as a straightforward solution, reinforcement learning (RL) suffers from high variance in policy updates, which impedes efficient policy improvement. Recently, direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. Though simple to implement, DPO is derived based on the optimal policy that is not assured to be achieved in practice, which undermines its convergence to the intended solution. In this paper, we propose efficient exact optimization (EXO) of the alignment objective. We prove that EXO is guaranteed to optimize in the same direction as the RL algorithms asymptotically for arbitary parametrization of the policy, while enables efficient optimization by circumventing the complexities associated with RL algorithms. We compare our method to DPO with both theoretical and empirical analyses, and further demonstrate the advantages of our method over existing approaches on realistic human preference data.

* 24 pages, 9 figures

Via

Access Paper or Ask Questions

Preconditioning for Physics-Informed Neural Networks

Feb 01, 2024

Songming Liu, Chang Su, Jiachen Yao, Zhongkai Hao, Hang Su, Youjia Wu, Jun Zhu

Figure 1 for Preconditioning for Physics-Informed Neural Networks

Figure 2 for Preconditioning for Physics-Informed Neural Networks

Figure 3 for Preconditioning for Physics-Informed Neural Networks

Figure 4 for Preconditioning for Physics-Informed Neural Networks

Abstract:Physics-informed neural networks (PINNs) have shown promise in solving various partial differential equations (PDEs). However, training pathologies have negatively affected the convergence and prediction accuracy of PINNs, which further limits their practical applications. In this paper, we propose to use condition number as a metric to diagnose and mitigate the pathologies in PINNs. Inspired by classical numerical analysis, where the condition number measures sensitivity and stability, we highlight its pivotal role in the training dynamics of PINNs. We prove theorems to reveal how condition number is related to both the error control and convergence of PINNs. Subsequently, we present an algorithm that leverages preconditioning to improve the condition number. Evaluations of 18 PDE problems showcase the superior performance of our method. Significantly, in 7 of these problems, our method reduces errors by an order of magnitude. These empirical findings verify the critical role of the condition number in PINNs' training.

Via

Access Paper or Ask Questions

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Dec 06, 2023

Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu

Figure 1 for Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Figure 2 for Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Figure 3 for Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Figure 4 for Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Abstract:In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation quality. However, because of the pre-defined data-to-noise diffusion process, their prior distribution is restricted to a noisy representation, which provides little information of the generation target. In this work, we present a novel TTS system, Bridge-TTS, making the first attempt to substitute the noisy Gaussian prior in established diffusion-based TTS methods with a clean and deterministic one, which provides strong structural information of the target. Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process. Moreover, the tractability and flexibility of our formulation allow us to empirically study the design spaces such as noise schedules, as well as to develop stochastic and deterministic samplers. Experimental results on the LJ-Speech dataset illustrate the effectiveness of our method in terms of both synthesis quality and sampling efficiency, significantly outperforming our diffusion counterpart Grad-TTS in 50-step/1000-step synthesis and strong fast TTS models in few-step scenarios. Project page: https://bridge-tts.github.io/

Via

Access Paper or Ask Questions

Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

Dec 01, 2023

Kehui Yao, Jingyi Huang, Jun Zhu

Figure 1 for Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

Figure 2 for Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

Figure 3 for Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

Figure 4 for Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

Abstract:Effective management of environmental resources and agricultural sustainability heavily depends on accurate soil moisture data. However, datasets like the SMAP/Sentinel-1 soil moisture product often contain missing values across their spatiotemporal grid, which poses a significant challenge. This paper introduces a novel Spatiotemporal Transformer model (ST-Transformer) specifically designed to address the issue of missing values in sparse spatiotemporal datasets, particularly focusing on soil moisture data. The ST-Transformer employs multiple spatiotemporal attention layers to capture the complex spatiotemporal correlations in the data and can integrate additional spatiotemporal covariates during the imputation process, thereby enhancing its accuracy. The model is trained using a self-supervised approach, enabling it to autonomously predict missing values from observed data points. Our model's efficacy is demonstrated through its application to the SMAP 1km soil moisture data over a 36 x 36 km grid in Texas. It showcases superior accuracy compared to well-known imputation methods. Additionally, our simulation studies on other datasets highlight the model's broader applicability in various spatiotemporal imputation tasks.

Via

Access Paper or Ask Questions