Picture for Daniel Kang

Daniel Kang

Pervasive Annotation Errors Break Text-to-SQL Benchmarks and Leaderboards

Add code
Jan 13, 2026
Viaarxiv icon

AIA Forecaster: Technical Report

Add code
Nov 10, 2025
Viaarxiv icon

DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

Add code
Oct 31, 2025
Viaarxiv icon

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

Add code
Oct 31, 2025
Viaarxiv icon

ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation

Add code
Jul 09, 2025
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Figure 1 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 2 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 3 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 4 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Viaarxiv icon

Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?

Add code
Jun 24, 2025
Viaarxiv icon

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Add code
Jun 10, 2025
Viaarxiv icon

ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines

Add code
Apr 07, 2025
Viaarxiv icon

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Add code
Mar 21, 2025
Figure 1 for CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Figure 2 for CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Figure 3 for CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Figure 4 for CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Viaarxiv icon