Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Xiong

HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

Feb 17, 2025

Kenan Jiang, Li Xiong, Fei Liu

Abstract:We investigate factors contributing to LLM agents' success in competitive multi-agent environments, using auctions as a testbed where agents bid to maximize profit. The agents are equipped with bidding domain knowledge, distinct personas that reflect item preferences, and a memory of auction history. Our work extends the classic auction scenario by creating a realistic environment where multiple agents bid on houses, weighing aspects such as size, location, and budget to secure the most desirable homes at the lowest prices. Particularly, we investigate three key questions: (a) How does a persona influence an agent's behavior in a competitive setting? (b) Can an agent effectively profile its competitors' behavior during auctions? (c) How can persona profiling be leveraged to create an advantage using strategies such as theory of mind? Through a series of experiments, we analyze the behaviors of LLM agents and shed light on new findings. Our testbed, called HARBOR, offers a valuable platform for deepening our understanding of multi-agent workflows in competitive environments.

Via

Access Paper or Ask Questions

A Survey on Data Markets

Nov 09, 2024

Jiayao Zhang, Yuran Bi, Mengye Cheng, Jinfei Liu, Kui Ren, Qiheng Sun, Yihang Wu, Yang Cao, Raul Castro Fernandez, Haifeng Xu(+8 more)

Abstract:Data is the new oil of the 21st century. The growing trend of trading data for greater welfare has led to the emergence of data markets. A data market is any mechanism whereby the exchange of data products including datasets and data derivatives takes place as a result of data buyers and data sellers being in contact with one another, either directly or through mediating agents. It serves as a coordinating mechanism by which several functions, including the pricing and the distribution of data as the most important ones, interact to make the value of data fully exploited and enhanced. In this article, we present a comprehensive survey of this important and emerging direction from the aspects of data search, data productization, data transaction, data pricing, revenue allocation as well as privacy, security, and trust issues. We also investigate the government policies and industry status of data markets across different countries and different domains. Finally, we identify the unresolved challenges and discuss possible future directions for the development of data markets.

Via

Access Paper or Ask Questions

Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

Aug 25, 2024

Siyu Li, Toan Tran, Haowen Lin, John Khrumm, Cyrus Shahabi, Li Xiong

Figure 1 for Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

Figure 2 for Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

Figure 3 for Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

Figure 4 for Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Spatiotemporal Constraints

Abstract:Simulating human mobility data is essential for various application domains, including transportation, urban planning, and epidemic control, since real data are often inaccessible to researchers due to expensive costs and privacy issues. Several existing deep generative solutions propose learning from real trajectories to generate synthetic ones. Despite the progress, most of them suffer from training stability issues and scale poorly with growing data size. More importantly, they generally lack control mechanisms to steer the generated trajectories based on spatiotemporal constraints such as fixing specific visits. To address such limitations, we formally define the controlled trajectory generation problem with spatiotemporal constraints and propose Geo-Llama. This novel LLM-inspired framework enforces explicit visit constraints in a contextually coherent way. It fine-tunes pre-trained LLMs on trajectories with a visit-wise permutation strategy where each visit corresponds to a time and location. This enables the model to capture the spatiotemporal patterns regardless of visit orders and allows flexible and in-context constraint integration through prompts during generation. Extensive experiments on real-world and synthetic datasets validate the effectiveness of Geo-Llama, demonstrating its versatility and robustness in handling a broad range of constraints to generate more realistic trajectories compared to existing methods.

Via

Access Paper or Ask Questions

TabularMark: Watermarking Tabular Datasets for Machine Learning

Jun 21, 2024

Yihao Zheng, Haocheng Xia, Junyuan Pang, Jinfei Liu, Kui Ren, Lingyang Chu, Yang Cao, Li Xiong

Figure 1 for TabularMark: Watermarking Tabular Datasets for Machine Learning

Figure 2 for TabularMark: Watermarking Tabular Datasets for Machine Learning

Figure 3 for TabularMark: Watermarking Tabular Datasets for Machine Learning

Figure 4 for TabularMark: Watermarking Tabular Datasets for Machine Learning

Abstract:Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we watermark tabular datasets without significantly compromising their utility for training ML models while preventing attackers from training usable ML models on attacked datasets? In this paper, we propose a hypothesis testing-based watermarking scheme, TabularMark. Data noise partitioning is utilized for data perturbation during embedding, which is adaptable for numerical and categorical attributes while preserving the data utility. For detection, a custom-threshold one proportion z-test is employed, which can reliably determine the presence of the watermark. Experiments on real-world and synthetic datasets demonstrate the superiority of TabularMark in detectability, non-intrusiveness, and robustness.

Via

Access Paper or Ask Questions

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

Jun 04, 2024

Yixuan Liu, Li Xiong, Yuhan Liu, Yujie Gu, Ruixuan Liu, Hong Chen

Abstract:Differentially Private Stochastic Gradients Descent (DP-SGD) is a prominent paradigm for preserving privacy in deep learning. It ensures privacy by perturbing gradients with random noise calibrated to their entire norm at each training step. However, this perturbation suffers from a sub-optimal performance: it repeatedly wastes privacy budget on the general converging direction shared among gradients from different batches, which we refer as common knowledge, yet yields little information gain. Motivated by this, we propose a differentially private training framework with early gradient decomposition and reconstruction (DPDR), which enables more efficient use of the privacy budget. In essence, it boosts model utility by focusing on incremental information protection and recycling the privatized common knowledge learned from previous gradients at early training steps. Concretely, DPDR incorporates three steps. First, it disentangles common knowledge and incremental information in current gradients by decomposing them based on previous noisy gradients. Second, most privacy budget is spent on protecting incremental information for higher information gain. Third, the model is updated with the gradient reconstructed from recycled common knowledge and noisy incremental information. Theoretical analysis and extensive experiments show that DPDR outperforms state-of-the-art baselines on both convergence rate and accuracy.

* 14 pages

Via

Access Paper or Ask Questions

Differentially Private Tabular Data Synthesis using Large Language Models

Jun 03, 2024

Toan V. Tran, Li Xiong

Figure 1 for Differentially Private Tabular Data Synthesis using Large Language Models

Figure 2 for Differentially Private Tabular Data Synthesis using Large Language Models

Figure 3 for Differentially Private Tabular Data Synthesis using Large Language Models

Figure 4 for Differentially Private Tabular Data Synthesis using Large Language Models

Abstract:Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data generators that can provide realistic synthetic datasets remains challenging. This paper introduces DP-LLMTGen -- a novel framework for differentially private tabular data synthesis that leverages pretrained large language models (LLMs). DP-LLMTGen models sensitive datasets using a two-stage fine-tuning procedure with a novel loss function specifically designed for tabular data. Subsequently, it generates synthetic data through sampling the fine-tuned LLMs. Our empirical evaluation demonstrates that DP-LLMTGen outperforms a variety of existing mechanisms across multiple datasets and privacy settings. Additionally, we conduct an ablation study and several experimental analyses to deepen our understanding of LLMs in addressing this important problem. Finally, we highlight the controllable generation ability of DP-LLMTGen through a fairness-constrained generation setting.

Via

Access Paper or Ask Questions

HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data Synthesization

May 13, 2024

Shun Takagi, Li Xiong, Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

Figure 1 for HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data Synthesization

Figure 2 for HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data Synthesization

Figure 3 for HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data Synthesization

Figure 4 for HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data Synthesization

Abstract:Human mobility data offers valuable insights for many applications such as urban planning and pandemic response, but its use also raises privacy concerns. In this paper, we introduce the Hierarchical and Multi-Resolution Network (HRNet), a novel deep generative model specifically designed to synthesize realistic human mobility data while guaranteeing differential privacy. We first identify the key difficulties inherent in learning human mobility data under differential privacy. In response to these challenges, HRNet integrates three components: a hierarchical location encoding mechanism, multi-task learning across multiple resolutions, and private pre-training. These elements collectively enhance the model's ability under the constraints of differential privacy. Through extensive comparative experiments utilizing a real-world dataset, HRNet demonstrates a marked improvement over existing methods in balancing the utility-privacy trade-off.

Via

Access Paper or Ask Questions

Cross-silo Federated Learning with Record-level Personalized Differential Privacy

Jan 30, 2024

Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, Xiaofeng Meng

Abstract:Federated learning enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named rPDP-FL, employing a two-stage hybrid sampling scheme with both client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is to select the ideal per-record sampling probability q given the personalized privacy budget {\epsilon}. We introduce a versatile solution named Simulation-CurveFitting, allowing us to uncover a significant insight into the nonlinear correlation between q and {\epsilon} and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.

* 12 pages, 7 figures, under review

Via

Access Paper or Ask Questions

Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

Jan 19, 2024

Hong kyu Lee, Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong

Figure 1 for Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

Figure 2 for Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

Figure 3 for Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

Figure 4 for Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

Abstract:Machine unlearning aims to eliminate the influence of a subset of training samples (i.e., unlearning samples) from a trained model. Effectively and efficiently removing the unlearning samples without negatively impacting the overall model performance is still challenging. In this paper, we propose a contrastive unlearning framework, leveraging the concept of representation learning for more effective unlearning. It removes the influence of unlearning samples by contrasting their embeddings against the remaining samples so that they are pushed away from their original classes and pulled toward other classes. By directly optimizing the representation space, it effectively removes the influence of unlearning samples while maintaining the representations learned from the remaining samples. Experiments on a variety of datasets and models on both class unlearning and sample unlearning showed that contrastive unlearning achieves the best unlearning effects and efficiency with the lowest performance loss compared with the state-of-the-art algorithms.

Via

Access Paper or Ask Questions

Does Differential Privacy Prevent Backdoor Attacks in Practice?

Nov 10, 2023

Fereshteh Razmi, Jian Lou, Li Xiong

Figure 1 for Does Differential Privacy Prevent Backdoor Attacks in Practice?

Figure 2 for Does Differential Privacy Prevent Backdoor Attacks in Practice?

Figure 3 for Does Differential Privacy Prevent Backdoor Attacks in Practice?

Figure 4 for Does Differential Privacy Prevent Backdoor Attacks in Practice?

Abstract:Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough investigation is required to assess the effectiveness of different DP techniques in preventing backdoor attacks in practice. In this paper, we investigate the effectiveness of DP-SGD and, for the first time in literature, examine PATE in the context of backdoor attacks. We also explore the role of different components of DP algorithms in defending against backdoor attacks and will show that PATE is effective against these attacks due to the bagging structure of the teacher models it employs. Our experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms. Additionally, we propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE. We conclude that while Label-DP algorithms generally offer weaker privacy protection, accurate hyper-parameter tuning can make them more effective than DP methods in defending against backdoor attacks while maintaining model accuracy.

Via

Access Paper or Ask Questions