Picture for Yuxuan Zhu

Yuxuan Zhu

ReViSQL: Achieving Human-Level Text-to-SQL

Add code
Mar 20, 2026
Viaarxiv icon

Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards

Add code
Mar 17, 2026
Viaarxiv icon

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Add code
Feb 23, 2026
Viaarxiv icon

Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs

Add code
Feb 02, 2026
Viaarxiv icon

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Pervasive Annotation Errors Break Text-to-SQL Benchmarks and Leaderboards

Add code
Jan 13, 2026
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Figure 1 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 2 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 3 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 4 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Viaarxiv icon

Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?

Add code
Jun 24, 2025
Viaarxiv icon

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Add code
Jun 10, 2025
Viaarxiv icon