Alert button
Picture for Daniel Kang

Daniel Kang

Alert button

A Safe Harbor for AI Evaluation and Red Teaming

Mar 07, 2024
Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Mar 05, 2024
Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang

Figure 1 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Figure 2 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Figure 3 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Figure 4 for InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Viaarxiv icon

LLM Agents can Autonomously Hack Websites

Feb 16, 2024
Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang

Viaarxiv icon

Removing RLHF Protections in GPT-4 via Fine-Tuning

Nov 10, 2023
Qiusi Zhan, Richard Fang, Rohan Bindu, Akul Gupta, Tatsunori Hashimoto, Daniel Kang

Figure 1 for Removing RLHF Protections in GPT-4 via Fine-Tuning
Figure 2 for Removing RLHF Protections in GPT-4 via Fine-Tuning
Viaarxiv icon

Identifying and Mitigating the Security Risks of Generative AI

Aug 28, 2023
Clark Barrett, Brad Boyd, Ellie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

Figure 1 for Identifying and Mitigating the Security Risks of Generative AI
Viaarxiv icon

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

Feb 11, 2023
Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto

Figure 1 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Figure 2 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Figure 3 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Figure 4 for Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Viaarxiv icon

Q-Diffusion: Quantizing Diffusion Models

Feb 10, 2023
Xiuyu Li, Long Lian, Yijiang Liu, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer

Figure 1 for Q-Diffusion: Quantizing Diffusion Models
Figure 2 for Q-Diffusion: Quantizing Diffusion Models
Figure 3 for Q-Diffusion: Quantizing Diffusion Models
Figure 4 for Q-Diffusion: Quantizing Diffusion Models
Viaarxiv icon

Scaling up Trustless DNN Inference with Zero-Knowledge Proofs

Oct 17, 2022
Daniel Kang, Tatsunori Hashimoto, Ion Stoica, Yi Sun

Figure 1 for Scaling up Trustless DNN Inference with Zero-Knowledge Proofs
Figure 2 for Scaling up Trustless DNN Inference with Zero-Knowledge Proofs
Figure 3 for Scaling up Trustless DNN Inference with Zero-Knowledge Proofs
Figure 4 for Scaling up Trustless DNN Inference with Zero-Knowledge Proofs
Viaarxiv icon