Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mehran Kafai

Web IP at Risk: Prevent Unauthorized Real-Time Retrieval by Large Language Models

May 19, 2025

Yisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, Zhuangdi Zhu

Figure 1 for Web IP at Risk: Prevent Unauthorized Real-Time Retrieval by Large Language Models

Figure 2 for Web IP at Risk: Prevent Unauthorized Real-Time Retrieval by Large Language Models

Figure 3 for Web IP at Risk: Prevent Unauthorized Real-Time Retrieval by Large Language Models

Figure 4 for Web IP at Risk: Prevent Unauthorized Real-Time Retrieval by Large Language Models

Abstract:Protecting cyber Intellectual Property (IP) such as web content is an increasingly critical concern. The rise of large language models (LLMs) with online retrieval capabilities presents a double-edged sword that enables convenient access to information but often undermines the rights of original content creators. As users increasingly rely on LLM-generated responses, they gradually diminish direct engagement with original information sources, significantly reducing the incentives for IP creators to contribute, and leading to a saturating cyberspace with more AI-generated content. In response, we propose a novel defense framework that empowers web content creators to safeguard their web-based IP from unauthorized LLM real-time extraction by leveraging the semantic understanding capability of LLMs themselves. Our method follows principled motivations and effectively addresses an intractable black-box optimization problem. Real-world experiments demonstrated that our methods improve defense success rates from 2.5% to 88.6% on different LLMs, outperforming traditional defenses such as configuration-based restrictions.

* 13 pages, 13 figures, 4 tables

Via

Access Paper or Ask Questions