Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyang Lu

De-attribute to Forget for LLM Unlearning

May 29, 2026

Xinyang Lu, Jiabao Pan, Rachael Hwee Ling Sim, See-Kiong Ng, Anthony Kum Hoe Tung, Bryan Kian Hsiang Low

Abstract:The rapid development of large language models (LLMs) has raised concerns on the use of inappropriate data for training, which has led to a growing interest in LLM unlearning. Many existing LLM unlearning approaches rely on optimizing prediction loss(es), such as maximizing the loss on the forget set, but often face critical issues like over-forgetting and poor model utility. To address them, this paper novelly frames the optimization objective for LLM unlearning as one of zeroing out data attribution instead. In particular, we propose the first LLM unlearning framework based on data attribution rewards called DareU that performs reinforcement learning to update the LLM by reducing the attribution score of its generated responses (i.e., de-attributing) to the forget data owners. Empirical evaluation using an LLM classifier as an efficient approximation of attribution shows that DareU outperforms existing baselines by achieving effective unlearning while balancing forget quality and model utility well.

Via

Access Paper or Ask Questions

WaterDrum: Watermarking for Data-centric Unlearning Metric

May 08, 2025

Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

Abstract:Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming these limitations. We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new benchmark datasets are released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.

Via

Access Paper or Ask Questions

Global-to-Local Support Spectrums for Language Model Explainability

Aug 12, 2024

Lucas Agussurja, Xinyang Lu, Bryan Kian Hsiang Low

Figure 1 for Global-to-Local Support Spectrums for Language Model Explainability

Figure 2 for Global-to-Local Support Spectrums for Language Model Explainability

Figure 3 for Global-to-Local Support Spectrums for Language Model Explainability

Figure 4 for Global-to-Local Support Spectrums for Language Model Explainability

Abstract:Existing sample-based methods, like influence functions and representer points, measure the importance of a training point by approximating the effect of its removal from training. As such, they are skewed towards outliers and points that are very close to the decision boundaries. The explanations provided by these methods are often static and not specific enough for different test points. In this paper, we propose a method to generate an explanation in the form of support spectrums which are based on two main ideas: the support sets and a global-to-local importance measure. The support set is the set of training points, in the predicted class, that ``lie in between'' the test point and training points in the other classes. They indicate how well the test point can be distinguished from the points not in the predicted class. The global-to-local importance measure is obtained by decoupling existing methods into the global and local components which are then used to select the points in the support set. Using this method, we are able to generate explanations that are tailored to specific test points. In the experiments, we show the effectiveness of the method in image classification and text generation tasks.

Via

Access Paper or Ask Questions

TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Jul 06, 2024

Cheng Wang, Xinyang Lu, See-Kiong Ng, Bryan Kian Hsiang Low

Figure 1 for TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Figure 2 for TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Figure 3 for TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Figure 4 for TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Abstract:The rapid evolution of large language models (LLMs) represents a substantial leap forward in natural language understanding and generation. However, alongside these advancements come significant challenges related to the accountability and transparency of LLM responses. Reliable source attribution is essential to adhering to stringent legal and regulatory standards, including those set forth by the General Data Protection Regulation. Despite the well-established methods in source attribution within the computer vision domain, the application of robust attribution frameworks to natural language processing remains underexplored. To bridge this gap, we propose a novel and versatile TRansformer-based Attribution framework using Contrastive Embeddings called TRACE that, in particular, exploits contrastive learning for source attribution. We perform an extensive empirical evaluation to demonstrate the performance and efficiency of TRACE in various settings and show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of LLMs.

Via

Access Paper or Ask Questions

On Newton's Method to Unlearn Neural Networks

Jun 20, 2024

Nhung Bui, Xinyang Lu, See-Kiong Ng, Bryan Kian Hsian Low

Figure 1 for On Newton's Method to Unlearn Neural Networks

Figure 2 for On Newton's Method to Unlearn Neural Networks

Figure 3 for On Newton's Method to Unlearn Neural Networks

Figure 4 for On Newton's Method to Unlearn Neural Networks

Abstract:Machine unlearning facilitates personal data ownership, including the ``right to be forgotten''. The proliferation of applications of \emph{neural networks} (NNs) trained on users' personal data calls for the need to develop algorithms to unlearn an NN. Since retraining is costly, efficiency is often achieved through approximate unlearning which aims to unlearn a trained NN to be close to the retrained one (in distribution). Though the Newton's method has been used by previous works to approximately unlearn linear models, adapting it for unlearning an NN often encounters degenerate Hessians that make computing the Newton's update impossible. In this paper, we will first show that when coupled with naive yet often effective solutions to mitigate the degeneracy issue for unlearning, the Newton's method surprisingly suffers from catastrophic forgetting. To overcome this difficulty, we revise the Newton's method to include a theoretically justified regularizer and propose a cubic-regularized Newton's method for unlearning an NN. The cubic regularizer comes with the benefits of not requiring manual finetuning and affording a natural interpretation. Empirical evaluation on several models and real-world datasets shows that our method is more resilient to catastrophic forgetting and performs better than the baselines, especially in sequential unlearning.

Via

Access Paper or Ask Questions

WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

Oct 01, 2023

Jingtan Wang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

Figure 1 for WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

Figure 2 for WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

Figure 3 for WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

Figure 4 for WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

Abstract:The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate mapping from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.

Via

Access Paper or Ask Questions

Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning

Jun 28, 2023

Xinyang Lu, Flint Xiaofeng Fan, Tianying Wang

Figure 1 for Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning

Figure 2 for Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning

Figure 3 for Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning

Figure 4 for Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning

Abstract:Reinforcement Learning (RL) has made promising progress in planning and decision-making for Autonomous Vehicles (AVs) in simple driving scenarios. However, existing RL algorithms for AVs fail to learn critical driving skills in complex urban scenarios. First, urban driving scenarios require AVs to handle multiple driving tasks of which conventional RL algorithms are incapable. Second, the presence of other vehicles in urban scenarios results in a dynamically changing environment, which challenges RL algorithms to plan the action and trajectory of the AV. In this work, we propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method, which models the agent behavior in a hierarchical model by using the perception of the lidar and birdeye view. The proposed atHRL method learns to make decisions about the agent's future trajectory and computes target waypoints under continuous settings based on a hierarchical DDPG algorithm. The waypoints planned by the atHRL model are then sent to a low-level controller to generate the steering and throttle commands required for the vehicle maneuver. We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios that compose multiple tasks with the presence of other vehicles in the CARLA simulator. The experimental results suggest a significant performance improvement compared to the state-of-the-art RL methods.

* ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems

Via

Access Paper or Ask Questions

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Jun 22, 2023

Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, Bryan Hooi

Figure 1 for Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Figure 2 for Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Figure 3 for Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Figure 4 for Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Abstract:The task of empowering large language models (LLMs) to accurately express their confidence, referred to as confidence elicitation, is essential in ensuring reliable and trustworthy decision-making processes. Previous methods, which primarily rely on model logits, have become less suitable for LLMs and even infeasible with the rise of closed-source LLMs (e.g., commercialized LLM APIs). This leads to a growing need to explore the untapped area of \emph{non-logit-based} approaches to estimate the uncertainty of LLMs. Hence, in this study, we investigate approaches for confidence elicitation that do not require model fine-tuning or access to proprietary information. We introduce three categories of methods: verbalize-based, consistency-based, and their hybrid methods for benchmarking, and evaluate their performance across five types of datasets and four widely-used LLMs. Our analysis of these methods uncovers several key insights: 1) LLMs often exhibit a high degree of overconfidence when verbalizing their confidence; 2) Prompting strategies such as CoT, Top-K and Multi-step confidences improve calibration of verbalized confidence; 3) Consistency-based methods outperform the verbalized confidences in most cases, with particularly notable improvements on the arithmetic reasoning task; 4) Hybrid methods consistently deliver the best performance over their baselines, thereby emerging as a promising state-of-the-art approach; 5) Despite these advancements, all investigated methods continue to struggle with challenging tasks, such as those requiring professional knowledge, leaving significant scope for improvement of confidence elicitation.

* 11 Pages

Via

Access Paper or Ask Questions