Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenjing Lou

Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization

Jun 11, 2026

Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou

Abstract:Modern LLM-powered autonomous agents increasingly rely on rich user interface (UI) state observations to achieve reliable action grounding in complex digital environments. However, many deployments transmit the full UI state to remote inference servers even when most elements are irrelevant to the current task, which can leak sensitive but unnecessary context such as authentication codes, private notifications, and background application states. We propose MINIM, a trusted local broker that performs privacy-aware minimization on the client side before any observation leaves the device. Grounded in Contextual Integrity (CI), MINIM learns a dual-score representation for each UI element by predicting an inherent sensitivity score (s) and a task-conditioned necessity score (n). These scores drive a ternary disclosure policy that keeps essential elements, abstracts sensitive attributes when needed, and removes task-irrelevant content. We optimize a CI-aware objective that penalizes necessity errors more strongly on high-risk content, enabling aggressive pruning while preserving task-critical information. Experiments on real-world UI observations derived from WebArena show that MINIM substantially reduces task-irrelevant sensitive leakage while preserving task-critical semantic context and the interactive affordances required for reliable agent actions.

* Accepted at ICML 2026 (43rd International Conference on Machine Learning, Seoul, South Korea). Code available at https://github.com/yyyyhx/MINIM

Via

Access Paper or Ask Questions

Harvesting Private Medical Images in Federated Learning Systems with Crafted Models

Jul 13, 2024

Shanghao Shi, Md Shahedul Haque, Abhijeet Parida, Marius George Linguraru, Y. Thomas Hou, Syed Muhammad Anwar, Wenjing Lou

Figure 1 for Harvesting Private Medical Images in Federated Learning Systems with Crafted Models

Figure 2 for Harvesting Private Medical Images in Federated Learning Systems with Crafted Models

Figure 3 for Harvesting Private Medical Images in Federated Learning Systems with Crafted Models

Figure 4 for Harvesting Private Medical Images in Federated Learning Systems with Crafted Models

Abstract:Federated learning (FL) allows a set of clients to collaboratively train a machine-learning model without exposing local training samples. In this context, it is considered to be privacy-preserving and hence has been adopted by medical centers to train machine-learning models over private data. However, in this paper, we propose a novel attack named MediLeak that enables a malicious parameter server to recover high-fidelity patient images from the model updates uploaded by the clients. MediLeak requires the server to generate an adversarial model by adding a crafted module in front of the original model architecture. It is published to the clients in the regular FL training process and each client conducts local training on it to generate corresponding model updates. Then, based on the FL protocol, the model updates are sent back to the server and our proposed analytical method recovers private data from the parameter updates of the crafted module. We provide a comprehensive analysis for MediLeak and show that it can successfully break the state-of-the-art cryptographic secure aggregation protocols, designed to protect the FL systems from privacy inference attacks. We implement MediLeak on the MedMNIST and COVIDx CXR-4 datasets. The results show that MediLeak can nearly perfectly recover private images with high recovery rates and quantitative scores. We further perform downstream tasks such as disease classification with the recovered data, where our results show no significant performance degradation compared to using the original training samples.

Via

Access Paper or Ask Questions

BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Jul 12, 2024

Ning Wang, Shanghao Shi, Yang Xiao, Yimin Chen, Y. Thomas Hou, Wenjing Lou

Figure 1 for BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Figure 2 for BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Figure 3 for BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Figure 4 for BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Abstract:Federated learning, while being a promising approach for collaborative model training, is susceptible to poisoning attacks due to its decentralized nature. Backdoor attacks, in particular, have shown remarkable stealthiness, as they selectively compromise predictions for inputs containing triggers. Previous endeavors to detect and mitigate such attacks are based on the Independent and Identically Distributed (IID) data assumption where benign model updates exhibit high-level similarity in multiple feature spaces due to IID data. Thus, outliers are detected as backdoor attacks. Nevertheless, non-IID data presents substantial challenges in backdoor attack detection, as the data variety introduces variance among benign models, making outlier detection-based mechanisms less effective. We propose a novel distribution-aware anomaly detection mechanism, BoBa, to address this problem. In order to differentiate outliers arising from data variety versus backdoor attack, we propose to break down the problem into two steps: clustering clients utilizing their data distribution followed by a voting-based detection. Based on the intuition that clustering and subsequent backdoor detection can drastically benefit from knowing client data distributions, we propose a novel data distribution inference mechanism. To improve detection robustness, we introduce an overlapping clustering method, where each client is associated with multiple clusters, ensuring that the trustworthiness of a model update is assessed collectively by multiple clusters rather than a single cluster. Through extensive evaluations, we demonstrate that BoBa can reduce the attack success rate to lower than 0.001 while maintaining high main task accuracy across various attack strategies and experimental settings.

Via

Access Paper or Ask Questions

NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks

Jun 18, 2024

Md Hasan Shahriar, Ning Wang, Y. Thomas Hou, Wenjing Lou

Figure 1 for NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks

Figure 2 for NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks

Figure 3 for NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks

Figure 4 for NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks

Abstract:The exponential adoption of machine learning (ML) is propelling the world into a future of intelligent automation and data-driven solutions. However, the proliferation of malicious data manipulation attacks against ML, namely adversarial and backdoor attacks, jeopardizes its reliability in safety-critical applications. The existing detection methods against such attacks are built upon assumptions, limiting them in diverse practical scenarios. Thus, motivated by the need for a more robust and unified defense mechanism, we investigate the shared traits of adversarial and backdoor attacks and propose NoiSec that leverages solely the noise, the foundational root cause of such attacks, to detect any malicious data alterations. NoiSec is a reconstruction-based detector that disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation. Experimental evaluations conducted on the CIFAR10 dataset demonstrate the efficacy of NoiSec, achieving AUROC scores exceeding 0.954 and 0.852 under white-box and black-box adversarial attacks, respectively, and 0.992 against backdoor attacks. Notably, NoiSec maintains a high detection performance, keeping the false positive rate within only 1\%. Comparative analyses against MagNet-based baselines reveal NoiSec's superior performance across various attack scenarios.

* 20 pages, 7 figures

Via

Access Paper or Ask Questions

ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models

May 03, 2024

Heng Jin, Chaoyu Zhang, Shanghao Shi, Wenjing Lou, Y. Thomas Hou

Figure 1 for ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models

Figure 2 for ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models

Figure 3 for ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models

Figure 4 for ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models

Abstract:Large language models (LLMs) have attracted significant attention in recent years. Due to their "Large" nature, training LLMs from scratch consumes immense computational resources. Since several major players in the artificial intelligence (AI) field have open-sourced their original LLMs, an increasing number of individual researchers and smaller companies are able to build derivative LLMs based on these open-sourced models at much lower costs. However, this practice opens up possibilities for unauthorized use or reproduction that may not comply with licensing agreements, and deriving models can change the model's behavior, thus complicating the determination of model ownership. Current copyright protection schemes for LLMs are either designed for white-box settings or require additional modifications to the original model, which restricts their use in real-world settings. In this paper, we propose ProFLingo, a black-box fingerprinting-based copyright protection scheme for LLMs. ProFLingo generates adversarial examples (AEs) that can represent the unique decision boundary characteristics of an original model, thereby establishing unique fingerprints. Our scheme checks the effectiveness of these adversarial examples on a suspect model to determine whether it has been derived from the original model. ProFLingo offers a non-invasive approach, which neither requires knowledge of the suspect model nor modifications to the base model or its training process. To the best of our knowledge, our method represents the first black-box fingerprinting technique for copyright protection for LLMs. Our source code and generated AEs are available at: https://github.com/hengvt/ProFLingo_arXiv.

* This is the author's pre-print version of the work. It is posted here for your personal use. Not for redistribution

Via

Access Paper or Ask Questions

Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

Nov 14, 2023

Shanghao Shi, Ning Wang, Yang Xiao, Chaoyu Zhang, Yi Shi, Y. Thomas Hou, Wenjing Lou

Figure 1 for Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

Figure 2 for Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

Figure 3 for Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

Figure 4 for Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

Abstract:Federated learning is known for its capability to safeguard participants' data privacy. However, recently emerged model inversion attacks (MIAs) have shown that a malicious parameter server can reconstruct individual users' local data samples through model updates. The state-of-the-art attacks either rely on computation-intensive search-based optimization processes to recover each input batch, making scaling difficult, or they involve the malicious parameter server adding extra modules before the global model architecture, rendering the attacks too conspicuous and easily detectable. To overcome these limitations, we propose Scale-MIA, a novel MIA capable of efficiently and accurately recovering training samples of clients from the aggregated updates, even when the system is under the protection of a robust secure aggregation protocol. Unlike existing approaches treating models as black boxes, Scale-MIA recognizes the importance of the intricate architecture and inner workings of machine learning models. It identifies the latent space as the critical layer for breaching privacy and decomposes the complex recovery task into an innovative two-step process to reduce computation complexity. The first step involves reconstructing the latent space representations (LSRs) from the aggregated model updates using a closed-form inversion mechanism, leveraging specially crafted adversarial linear layers. In the second step, the whole input batches are recovered from the LSRs by feeding them into a fine-tuned generative decoder. We implemented Scale-MIA on multiple commonly used machine learning models and conducted comprehensive experiments across various settings. The results demonstrate that Scale-MIA achieves excellent recovery performance on different datasets, exhibiting high reconstruction rates, accuracy, and attack efficiency on a larger scale compared to state-of-the-art MIAs.

Via

Access Paper or Ask Questions

CANShield: Signal-based Intrusion Detection for Controller Area Networks

May 03, 2022

Md Hasan Shahriar, Yang Xiao, Pablo Moriano, Wenjing Lou, Y. Thomas Hou

Figure 1 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Figure 2 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Figure 3 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Figure 4 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Abstract:Modern vehicles rely on a fleet of electronic control units (ECUs) connected through controller area network (CAN) buses for critical vehicular control. However, with the expansion of advanced connectivity features in automobiles and the elevated risks of internal system exposure, the CAN bus is increasingly prone to intrusions and injection attacks. The ordinary injection attacks disrupt the typical timing properties of the CAN data stream, and the rule-based intrusion detection systems (IDS) can easily detect them. However, advanced attackers can inject false data to the time series sensory data (signal), while looking innocuous by the pattern/frequency of the CAN messages. Such attacks can bypass the rule-based IDS or any anomaly-based IDS built on binary payload data. To make the vehicles robust against such intelligent attacks, we propose CANShield, a signal-based intrusion detection framework for the CAN bus. CANShield consists of three modules: a data preprocessing module that handles the high-dimensional CAN data stream at the signal level and makes them suitable for a deep learning model; a data analyzer module consisting of multiple deep autoencoder (AE) networks, each analyzing the time-series data from a different temporal perspective; and finally an attack detection module that uses an ensemble method to make the final decision. Evaluation results on two high-fidelity signal-based CAN attack datasets show the high accuracy and responsiveness of CANShield in detecting wide-range of advanced intrusion attacks.

* 15 pages, 6 figures, A version of this paper is accepted by escar USA 2022

Via

Access Paper or Ask Questions