Alert button
Picture for Daniel Kang

Daniel Kang

Alert button

LLM Agents can Autonomously Exploit One-day Vulnerabilities

Add code
Bookmark button
Alert button
Apr 17, 2024
Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang

Viaarxiv icon

Trustless Audits without Revealing Data or Models

Add code
Bookmark button
Alert button
Apr 06, 2024
Suppakit Waiwitlikhit, Ion Stoica, Yi Sun, Tatsunori Hashimoto, Daniel Kang

Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Bookmark button
Alert button
Mar 07, 2024
Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Add code
Bookmark button
Alert button
Mar 05, 2024
Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang

Figure 1 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Figure 2 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Figure 3 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Figure 4 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Viaarxiv icon

LLM Agents can Autonomously Hack Websites

Add code
Bookmark button
Alert button
Feb 16, 2024
Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang

Viaarxiv icon

Removing RLHF Protections in GPT-4 via Fine-Tuning

Add code
Bookmark button
Alert button
Nov 10, 2023
Qiusi Zhan, Richard Fang, Rohan Bindu, Akul Gupta, Tatsunori Hashimoto, Daniel Kang

Figure 1 for Removing RLHF Protections in GPT-4 via Fine-Tuning
Figure 2 for Removing RLHF Protections in GPT-4 via Fine-Tuning
Viaarxiv icon

Identifying and Mitigating the Security Risks of Generative AI

Add code
Bookmark button
Alert button
Aug 28, 2023
Clark Barrett, Brad Boyd, Ellie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

Figure 1 for Identifying and Mitigating the Security Risks of Generative AI
Viaarxiv icon

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

Add code
Bookmark button
Alert button
Feb 11, 2023
Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto

Figure 1 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Figure 2 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Figure 3 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Figure 4 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Viaarxiv icon