Picture for Taojie Zhu

Taojie Zhu

On-Policy Replay for Continual Supervised Fine-Tuning

Add code
May 28, 2026
Viaarxiv icon

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Add code
May 27, 2026
Viaarxiv icon

Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning

Add code
Apr 10, 2026
Viaarxiv icon