Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chonghua Liao

Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities

May 21, 2025

Jinyang Wu, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Pengpeng Shao, Huazhe Xu, Jianhua Tao

Abstract:Reinforcement learning (RL) has emerged as an effective method for training reasoning models. However, existing RL approaches typically bias the model's output distribution toward reward-maximizing paths without introducing external knowledge. This limits their exploration capacity and results in a narrower reasoning capability boundary compared to base models. To address this limitation, we propose TAPO (Thought-Augmented Policy Optimization), a novel framework that augments RL by incorporating external high-level guidance ("thought patterns"). By adaptively integrating structured thoughts during training, TAPO effectively balances model-internal exploration and external guidance exploitation. Extensive experiments show that our approach significantly outperforms GRPO by 99% on AIME, 41% on AMC, and 17% on Minerva Math. Notably, these high-level thought patterns, abstracted from only 500 prior samples, generalize effectively across various tasks and models. This highlights TAPO's potential for broader applications across multiple tasks and domains. Our further analysis reveals that introducing external guidance produces powerful reasoning models with superior explainability of inference behavior and enhanced output readability.

Via

Access Paper or Ask Questions

Exploring Forgetting in Large Language Model Pre-Training

Oct 22, 2024

Chonghua Liao, Ruobing Xie, Xingwu Sun, Haowen Sun, Zhanhui Kang

Figure 1 for Exploring Forgetting in Large Language Model Pre-Training

Figure 2 for Exploring Forgetting in Large Language Model Pre-Training

Figure 3 for Exploring Forgetting in Large Language Model Pre-Training

Figure 4 for Exploring Forgetting in Large Language Model Pre-Training

Abstract:Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs). Despite the pioneering research on task-level forgetting in LLM fine-tuning, there is scant focus on forgetting during pre-training. We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention. Based on our revised assessment of forgetting metrics, we explored low-cost, straightforward methods to mitigate forgetting during the pre-training phase. Further, we carefully analyzed the learning curves, offering insights into the dynamics of forgetting. Extensive evaluations and analyses on forgetting of pre-training could facilitate future research on LLMs.

Via

Access Paper or Ask Questions

Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

Oct 08, 2023

Chenzhuang Du, Yue Zhao, Chonghua Liao, Jiacheng You, Jie Fu, Hang Zhao

Figure 1 for Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

Figure 2 for Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

Figure 3 for Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

Figure 4 for Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

Abstract:This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal models in certain tasks. It's clear that their incorporation into multi-modal learning would significantly improve performance. However, multi-modal learning with these models still suffers from insufficient learning of uni-modal features, which weakens the resulting multi-modal model's generalization ability. While fine-tuning uni-modal models separately and then aggregating their predictions is straightforward, it doesn't allow for adequate adaptation between modalities, also leading to sub-optimal results. To this end, we introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA). By freezing the weights of uni-modal fine-tuned models, adding extra trainable rank decomposition matrices to them, and subsequently performing multi-modal joint training, our method enhances adaptation between modalities and boosts overall performance. We demonstrate the effectiveness of MMLoRA on three dataset categories: audio-visual (e.g., AVE, Kinetics-Sound, CREMA-D), vision-language (e.g., MM-IMDB, UPMC Food101), and RGB-Optical Flow (UCF101).

Via

Access Paper or Ask Questions

Zero-Label Prompt Selection

Nov 09, 2022

Chonghua Liao, Yanan Zheng, Zhilin Yang

Figure 1 for Zero-Label Prompt Selection

Figure 2 for Zero-Label Prompt Selection

Figure 3 for Zero-Label Prompt Selection

Figure 4 for Zero-Label Prompt Selection

Abstract:Natural language prompts have been shown to facilitate cross-task generalization for large language models. However, with no or limited labeled examples, the cross-task performance is highly sensitive to the choice of prompts, while selecting a high-performing prompt is challenging given the scarcity of labels. To address the issue, we propose a Zero-Label Prompt Selection (ZPS) method that selects prompts without any labeled data or gradient update. Specifically, given the candidate human-written prompts for a task, ZPS labels a set of unlabeled data with a prompt ensemble and uses the pseudo-labels for prompt selection. Experiments show that ZPS improves over prior methods by a sizeable margin in zero-label performance. We also extend ZPS to a few-shot setting and show its advantages over strong baselines such as prompt tuning and model tuning.

Via

Access Paper or Ask Questions

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Oct 19, 2021

Chonghua Liao, Jiafan He, Quanquan Gu

Figure 1 for Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Figure 2 for Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Abstract:Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel $(\varepsilon, \delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an $\tilde{\mathcal{O}}( d^{5/4}H^{7/4}T^{3/4}\left(\log(1/\delta)\right)^{1/4}\sqrt{1/\varepsilon})$ regret, where $d$ is the dimension of feature mapping, $H$ is the length of the planning horizon, and $T$ is the number of interactions with the environment. We also prove a lower bound $\Omega(dH\sqrt{T}/\left(e^{\varepsilon}(e^{\varepsilon}-1)\right))$ for learning linear mixture MDPs under $\varepsilon$-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

* 25 pages, 2 figures

Via

Access Paper or Ask Questions