Alert button
Picture for Yiming Yang

Yiming Yang

Alert button

Accelerating Diffusion-based Combinatorial Optimization Solvers by Progressive Distillation

Aug 22, 2023
Junwei Huang, Zhiqing Sun, Yiming Yang

Figure 1 for Accelerating Diffusion-based Combinatorial Optimization Solvers by Progressive Distillation

Graph-based diffusion models have shown promising results in terms of generating high-quality solutions to NP-complete (NPC) combinatorial optimization (CO) problems. However, those models are often inefficient in inference, due to the iterative evaluation nature of the denoising diffusion process. This paper proposes to use progressive distillation to speed up the inference by taking fewer steps (e.g., forecasting two steps ahead within a single step) during the denoising process. Our experimental results show that the progressively distilled model can perform inference 16 times faster with only 0.019% degradation in performance on the TSP-50 dataset.

* Published at ICML 2023, Sampling and Optimization in Discrete Space Workshop. The implementation is at https://github.com/jwrh/Accelerating-Diffusion-based-Combinatorial-Optimization-Solvers-by-Progressive-Distillation 
Viaarxiv icon

Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation

Aug 07, 2023
Renjie Liang, Yiming Yang, Hui Lu, Li Li

Figure 1 for Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation
Figure 2 for Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation
Figure 3 for Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation
Figure 4 for Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation

Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps described by the natural language query from untrimmed videos. This paper discusses the challenge of achieving efficient computation in TSGV models while maintaining high performance. Most existing approaches exquisitely design complex architectures to improve accuracy with extra layers and loss, suffering from inefficiency and heaviness. Although some works have noticed that, they only make an issue of feature fusion layers, which can hardly enjoy the highspeed merit in the whole clunky network. To tackle this problem, we propose a novel efficient multi-teacher model (EMTM) based on knowledge distillation to transfer diverse knowledge from both heterogeneous and isomorphic networks. Specifically, We first unify different outputs of the heterogeneous models into one single form. Next, a Knowledge Aggregation Unit (KAU) is built to acquire high-quality integrated soft labels from multiple teachers. After that, the KAU module leverages the multi-scale video and global query information to adaptively determine the weights of different teachers. A Shared Encoder strategy is then proposed to solve the problem that the student shallow layers hardly benefit from teachers, in which an isomorphic teacher is collaboratively trained with the student to align their hidden states. Extensive experimental results on three popular TSGV benchmarks demonstrate that our method is both effective and efficient without bells and whistles.

Viaarxiv icon

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Jul 22, 2023
Qingyang Zhang, Yiming Yang, Jingqing Ruan, Xuantang Xiong, Dengpeng Xing, Bo Xu

Figure 1 for Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Figure 2 for Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Figure 3 for Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Figure 4 for Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance. Our code is available at https://github.com/papercode2022/HILL.

* Accepted by the conference of International Joint Conference on Neural Networks (IJCNN) 2023 
Viaarxiv icon

PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification

May 24, 2023
Yau-Shian Wang, Ta-Chung Chi, Ruohong Zhang, Yiming Yang

Figure 1 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Figure 2 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Figure 3 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Figure 4 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification

We present PESCO, a novel contrastive learning framework that substantially improves the performance of zero-shot text classification. We formulate text classification as a neural text matching problem where each document is treated as a query, and the system learns the mapping from each query to the relevant class labels by (1) adding prompts to enhance label matching, and (2) using retrieved labels to enrich the training set in a self-training loop of contrastive learning. PESCO achieves state-of-the-art performance on four benchmark text classification datasets. On DBpedia, we achieve 98.5\% accuracy without any labeled data, which is close to the fully-supervised result. Extensive experiments and analyses show all the components of PESCO are necessary for improving the performance of zero-shot text classification.

* ACL 2023  
* accepted by ACL 2023 
Viaarxiv icon

Policy Representation via Diffusion Probability Model for Reinforcement Learning

May 22, 2023
Long Yang, Zhixiong Huang, Fenghao Lei, Yucun Zhong, Yiming Yang, Cong Fang, Shiting Wen, Binbin Zhou, Zhouchen Lin

Figure 1 for Policy Representation via Diffusion Probability Model for Reinforcement Learning
Figure 2 for Policy Representation via Diffusion Probability Model for Reinforcement Learning
Figure 3 for Policy Representation via Diffusion Probability Model for Reinforcement Learning
Figure 4 for Policy Representation via Diffusion Probability Model for Reinforcement Learning

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal distributions, which has shown promising and potential applications to RL. In this paper, we formally build a theoretical foundation of policy representation via the diffusion probability model and provide practical implementations of diffusion policy for online model-free RL. Concretely, we character diffusion policy as a stochastic process, which is a new approach to representing a policy. Then we present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy. Furthermore, we propose the DIPO which is an implementation for model-free online RL with DIffusion POlicy. To the best of our knowledge, DIPO is the first algorithm to solve model-free online RL problems with the diffusion model. Finally, extensive empirical results show the effectiveness and superiority of DIPO on the standard continuous control Mujoco benchmark.

Viaarxiv icon

Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

May 19, 2023
Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam

Figure 1 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
Figure 2 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
Figure 3 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
Figure 4 for Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency - poll the LLM multiple times and output the most frequent solution. Existing Self-Consistency techniques always draw a constant number of samples per question, where a better approach will be to non-uniformly distribute the available budget based on the amount of agreement in the samples drawn so far. In response, we introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question using a lightweight stopping criterion. Our experiments over 13 datasets and two LLMs demonstrate that Adaptive-Consistency reduces sample budget by up to 6.0 times with an average accuracy drop of less than 0.1%.

Viaarxiv icon

Active Retrieval Augmented Generation

May 11, 2023
Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig

Figure 1 for Active Retrieval Augmented Generation
Figure 2 for Active Retrieval Augmented Generation
Figure 3 for Active Retrieval Augmented Generation
Figure 4 for Active Retrieval Augmented Generation

Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval-augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout the generation process is essential. There have been some past efforts to retrieve information multiple times while generating outputs, which mostly retrieve documents at fixed intervals using the previous context as queries. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic retrieval-augmented generation method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method. Code and datasets are available at https://github.com/jzbjyb/FLARE.

Viaarxiv icon

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

May 04, 2023
Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

Figure 1 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Figure 2 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Figure 3 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Figure 4 for Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user's queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings.

* Project page: https://mitibmdemos.draco.res.ibm.com/dromedary 
Viaarxiv icon

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT

Apr 24, 2023
Ruohong Zhang, Yau-Shian Wang, Yiming Yang

Figure 1 for Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT
Figure 2 for Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT
Figure 3 for Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT
Figure 4 for Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT

Moreover, GPT-based zero-shot classification models tend to make independent predictions over test instances, which can be sub-optimal as the instance correlations and the decision boundaries in the target space are ignored. To address these difficulties and limitations, we propose a new approach to zero-shot text classification, namely \ourmodelshort, which leverages the strong generative power of GPT to assist in training a smaller, more adaptable, and efficient sentence encoder classifier with contrastive self-training. Specifically, GenCo applies GPT in two ways: firstly, it generates multiple augmented texts for each input instance to enhance the semantic embedding of the instance and improve the mapping to relevant labels; secondly, it generates augmented texts conditioned on the predicted label during self-training, which makes the generative process tailored to the decision boundaries in the target space. In our experiments, GenCo outperforms previous state-of-the-art methods on multiple benchmark datasets, even when only limited in-domain text data is available.

Viaarxiv icon