Picture for Huan Zhang

Huan Zhang

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Add code
Jun 14, 2025
Viaarxiv icon

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Add code
Jun 12, 2025
Viaarxiv icon

SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming

Add code
Jun 07, 2025
Viaarxiv icon

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Add code
Jun 05, 2025
Viaarxiv icon

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Add code
May 30, 2025
Viaarxiv icon

Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models

Add code
May 26, 2025
Viaarxiv icon

Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges

Add code
May 19, 2025
Viaarxiv icon

Token-Level Uncertainty Estimation for Large Language Model Reasoning

Add code
May 16, 2025
Viaarxiv icon

Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models

Add code
May 16, 2025
Viaarxiv icon

From Aesthetics to Human Preferences: Comparative Perspectives of Evaluating Text-to-Music Systems

Add code
Apr 30, 2025
Viaarxiv icon