Picture for Daniel Kang

Daniel Kang

Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?

Add code
Jun 24, 2025
Viaarxiv icon

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Add code
Jun 10, 2025
Viaarxiv icon

ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines

Add code
Apr 07, 2025
Viaarxiv icon

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Add code
Mar 21, 2025
Viaarxiv icon

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Add code
Feb 25, 2025
Viaarxiv icon

Voice-Enabled AI Agents can Perform Common Scams

Add code
Oct 21, 2024
Viaarxiv icon

Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

Add code
Jun 02, 2024
Figure 1 for Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Figure 2 for Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Figure 3 for Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Figure 4 for Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Viaarxiv icon

LLM Agents can Autonomously Exploit One-day Vulnerabilities

Add code
Apr 11, 2024
Viaarxiv icon

Trustless Audits without Revealing Data or Models

Add code
Apr 06, 2024
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon