Picture for Huan Zhang

Huan Zhang

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Add code
Jun 14, 2025
Viaarxiv icon

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Add code
Jun 12, 2025
Viaarxiv icon

SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming

Add code
Jun 07, 2025
Viaarxiv icon

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Add code
Jun 05, 2025
Viaarxiv icon

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Add code
May 30, 2025
Viaarxiv icon

Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models

Add code
May 26, 2025
Viaarxiv icon

Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges

Add code
May 19, 2025
Viaarxiv icon

Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models

Add code
May 16, 2025
Viaarxiv icon

Token-Level Uncertainty Estimation for Large Language Model Reasoning

Add code
May 16, 2025
Viaarxiv icon

From Aesthetics to Human Preferences: Comparative Perspectives of Evaluating Text-to-Music Systems

Add code
Apr 30, 2025
Viaarxiv icon